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IMPROVED EPITOPE DISPLAYING PHAGE 

BACKGROUND OF THS INVENTION 

floia of r** Tnvpntion 
5 This invention relates to the expression and display 

of libraries of mutated epitopic peptides or potential 
binding protein domains on the surface of phage, and the 
screening of those libraries to identify high affinity 
species . 

m Tflfomation Disclosure Statement • • • - — — •» 

The amino acid sequence of a protein determines its 

three-dimensional (3D) structure, . which in turn 

'"* determines protein function . Some residues on the 
polypeptide chain are more important than others in 
15 determining the 3D structure of a protein. Substitutions 
of amino acids that are exposed to solvent are less 
likely to affect the 3D structure than are substitutions 

at internal loci. 

"Protein engineering" is the art of manipulating the 
20 sequence of a ..protein in order to! alter its binding 
characteristics. The factors affecting protein binding 
are known, but designing new complementary surfaces has 

proven difficult. 

With the development of recombinant DNA techniques, 

25 it became possible to obtain a mutant protein by mutating 
the gene encoding the native protein and then expressing 
the mutated gene. Several mutagenesis strategies are 
known. One, "protein surgery", involves the introduction 
of one or more prf* drained mutations within the gene of 

30 choice, a sjjaglfi polypeptide of completely predetermined 
sequence is expressed, and its binding characteristics 

are evaluated. 

At the other extreme is random mutagenesis by. means 
of relatively nonspecific mutagens such as radiation and 
35 various chemical agents. 
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It is possible to randomly vary predetermined 
nucleotides using a mixture of bases in the appropriate 
cycles of a nucleic acid synthesis procedure. The 
proportion of bases in the mixture, for each position of 
5 a codon, will determine the frequency at which each amino 
acid will occur in the polypeptides expressed from the 
degenerate DNA population. Oliphant fit al. (OLIP86) «tu* 
Oliphant and Struhl (OLIP87) have demonstrated ligation 
and cloning of highly degenerate oligonucleotides, which 
10 were used in the mutation of promoters. They suggested* 
, that similar methods could be used in the variation 6f 
. protein .coding regions . They do not say how one should: 
a) choose protein residues to vary, or b) select or 
screen mutants with desirable properties. Reidhaar- Olson 
15 and Sauer (REID88a) have used synthetic degenerate oligo- 
nts to vary simultaneously two or three residues through 
all twenty amino acids. See also Vershon st al, 
(VERS86a; VERS86b) . Reidhaar- Olson and Sauer do not 
discuss the limits on how many residues could be varied 
20 at once nor do they mention the problem of unequal 
abundance of DNA encoding different amino acids. 

A number of researchers have directed unmitated 
foreign antigenic epitopes to the surface of phage, fused 
to a native phage surface protein, and demonstrated that 
25 the epitopes were recognized by antibodies. 

Dulbecco (DUI1B86) suggests a procedure for incor- 
porating a foreign antigenic epitope into a viral surface 
protein so that the expressed chimeric protein is dis- 
played on the surface of the virus in a manner such that 
30 the foreign epitope is accessible to antibody. In 1985 
Smith (SMIT85) reported inserting a nonfunctional segment 
of the EeoRI endonuclease gene into gene III of bacterio- 
phage fl, "in phase". The gene III protein is a minor 
coat protein necessary for infectivity. Smith demons - 
35 trated that the recombinant phage were adsorbed by 
immobilized antibody raised against the EcoR l endonucle- 
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• ase, and could be eluted with acid. De la Cruz fit &Lm~ 

y (DELA88) have expressed a fragment of the repeat region 

^ of the circumsporozoite protein from Plaemodium 

falciparum on the surface of M13 as an insert in the gene 
5 III protein. They showed that the recombinant phage were 
both antigenic and immunogenic in rabbits, and that such 
recombinant phage could be used for B epitope mapping. 
The researchers suggest that similar recombinant phage 
could be used for T epitope mapping and for vaccine 
- ~ 10 - development. ^ ^ ^ ^ 

McCafferty fit al. (MCCA90) expressed a fusion of an 
;■- Fv fragment of an antibody to the N- terminal of the pi II 
protein. The Pv fragment was not mutated. 

Ladner, Glick, and Bird, WO88/06630 (publ. 7 Sept/ 
15 1988 and having priority from US application 07/021,046, 
assigned to Genex Corp.) (LGB) speculate that diverse 
single chain antibody domains (SCAD) may be screened for 
w^t.-; binding to a particular antigen by varying the- ^Mt= 

encoding the combining determining regions of a single 
20 chain antibody, subcloning the SCAD gene into the gpV 
gene of phage lambda so that a SCAD/gpV chimera is 
displayed on the outer surface of phage lambda, and 
selecting phage which bind to the antigen through 
affinity chromatography. 
25 Parmley and Smith (PARM88) suggested that an epitope 

library that eachibits all possible hexapeptides could be 
constructed and used to isolate epitopes that bind to 
antibodies. In discussing the epitope library, the 

r 

authors did not suggest that it was desirable to balance 
s 30 the representation of different amino acids. Nor did 

they teach that the insert should encode a complete 
domain of the exogenous protein. Bpitopes are considered 
to be unstructured peptides as opposed to structured 
proteins. Scott and Smith (SCOT90) and Cwirla fit al. 
35 (CWIR90) prepared "epitope libraries" in which potential 
hexapeptide epitopes for a target antibody were randomly 
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mutated by fusing degenerate oligonucleotides, encoding 
the epitopes, with gene III of fd phage, and expressing 
the fused gene in phage -infected cells. The cells 
manufactured fusion phage which displayed the epitopes on 
5 their surface; the phage which bound to immobilized 
antibody were eluted with acid and studied* Devlin et 
al. (DEVL90) similarly screened, using M13 phage, for 
random 15 residue epitopes recognized by streptavidin. 
The Scott and Smith, Cwirla sL al. . and Devlin et 

10 . aL . libraries provided a highly biased sampling of the 
possible amino acids at„ ea.ch position* Their primary 
_ , . , , concern ^ in des igning : the degenerate f pi igonucl eot ide_ 
encoding their yariable region was to ensure that all 
twenty amino acids were encodible at each position; a 

15 secondary consideration was minimizing the frequency of 
occurrence of stop signals. Consequently, Scott and 
Smith and Cwirla sL al» employed NNK (N= equal mixture of 
6, A, T, C; K^equal mixture of 6 and T) while Devlin et 
al used NNS (S=ec^al mixture of G and C) . There was no 

20 attempt to minimize the frequency ratio of most favored - 
to-least favored amino acid, or to equalize the rate of 
occurrence of acidic and basic amino acids . 

Devlin al. characterized several affinity- 

selected streptavidin-binding peptides, but did not 

25 measure the affinity constants for these peptides. 
Cwirla ££ al. did determine the affinity constant for his 
peptides, but were disappointed to find that his best 
hexapeptides had affinities (350-300nM) , "orders of 
magnitude" weaker than that of the native Met -enkephalin 

30 epitope (7nM) recognized by the target antibody. Cwirla 
et al . speculated that phage bearing peptides with higher 
affinities remained bound under acidic elution, possibly 
because of multivalent interactions between phage (carry- 
ing about 4 copies of pill) and the divalent target IgG. 

35 Scott and Smith were able to find peptides whose affinity 
for the target antibody (A2) was comparable to that of 
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the reference myohemerythrin epitope (50nM) . However, 
Scott and Smith likewise expressed concern that some 
high-affinity peptides were lost, possibly through 
irreversible binding of fusion phage to target* 

5 Ladner, et al, WO90/02809, incorporated by reference 

herein, describe a process for the generation and 
identification of novel binding proteins having affinity 
for a predetermined target. In this process, a gene 
encoding a potential binding domain (as distinct from a 
10 T — mere -epitopic peptfdeT^ being obtained by 

random mutagenesis of a limited number of predetermined 
^ - ; "r: codons % 3ia f u s ed to a ■ gene ti# eilismehtS : whi ch causes ^the - 
resulting chimeric expression product to be displayed on 
the outer surface of a virus (especially a filamentous 
15 phage) or a cell. Chromatographic selection is then used 
to identify viruses or cells whose genome includes such 
a fused gene which coded for the protein which bound to 
^jj^ chro^ target . ^ iiaciner , et al. discUiBSS 

^ several methods of recovering the gene of interest wheh 

20 the viruses or cells is so tightly bbruad tb the target 
that it cannot be washed off in viable form. These are 
growing them in situ on the chromatographic matrix, 
fragmenting the matrix and using it as an inoculant into 
a culture vessel, degrading the linkage between the 
25 matrix and the target material, and degrading the viruses 
or cells but then recovering their DNA. However, these 
methods will also recover viruses or cells which are 
nonspecif ically bound to the target material. 

WO90/02809 also addressed strategies for 
30 mutagenesis, including one which provides all twenty 
amino acids in substantially equal proportions, but only 
in the context of mutagenesis of protein domains , not 
epitopic peptides. 



35 
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The present invention is intended to overcome the 
deficiencies discussed above. In one embodiment of the 
invention, a library of "display phage" is used to 
identify binding domains with a high affinity for a 
5 predetermined target. Potential binding domains are 
displayed on the surface of the phage. This is achieved 
by expressing a fused gene which encodes a chimeric outer 
surface protein comprising the potential binding domain 
and at least a functional portion of a coat protein 

10 native to the jphage. The preferred embodiment uses a 
pattern of semi random mutagenesis, called "variegation" , 
that focuses mutations into those residues of a parental 
binding domain that are most likely to affect its binding 
properties and are least likely to destroy its underlying 

15 structure. As a result, while any one phage displays 
only a single foreign binding domain (though possibly in 
multiple copies) , the phage library collectively displays 
thousands, even millions, of different binding domains. 
The phage library is screened by affinity separation 

20 techniques to identify those phage bearing success fill 
(high affinity) binding domains, and these phage are 
recovered and characterized to determine the sequence of 
the successful binding domains. These successful binding 
domains may then serve as the parental binding domains 

25 for another round of variegation and affinity separation. 

In another embodiment of the invention, the display 
phage display on their surface a chimeric outer surface 
protein comprising a functional portion of a native outer 
surface protein and a potential epitope. In an epitope 

30 library made of these display phage, the region 
. corresponding to the foreign epitope is hypervariable. 
The library is screened with an antibody or other binding 
protein of interest and high affinity epitopes are 
identified. References to display, mutagenesis and 

35 screening of potential binding domains should be taken to 
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apply, mutatis mutandis , to display, mutagenesis and 
screening of potential epitopes, unless stated otherwise. 

As previously mentioned, when several copies of the 
chimeric coat protein are displayed on a single phage, 
5 there is a risk that irreversible binding will occur, 
especially if the target is multivalent (as with an 
antibody) . In this case, the phage last eluted by an 
elution gradient will not be the ones bearing the highest 
affinity epitopes or binding domains, but rather will be 

lp_ ^ high enough to hold on ,to the i 

target under the initial elut ipn . conditions but not s% 

I high , _as „., r tq zt . bind, ... . irre^n^^ a t - rwu^ m ^ ? 

methodology known in the_art may fail . to recover very, 
high affinity epitopes pr binding domains , which f pr : inany 

15 purposes are the most desirable species. 

We propose to cope with the problem of irreversible 
binding by incorporating into the chimeric coat protein 
a linker sequenper, -between the foreign epitope or biiiding^ 
domain, and the sequence native to the wild- type, phage 

20 coat protein, which is cleavable by a site- specific 
protease. In this case, the. phage library is incubated 
with the immobilized target. Lower affinity phage are 
eluted off the target and only the solid phase (bearing 
the high affinity phage) is retained. The aforementioned 

25 linker sequence is cleaved, and the phage particles are 
released, leaving the bound epitope or binding domain 
behind. One may then recover the particles (and sequence 
their DNA to determine the sequence of the corresponding 
epitope or binding domain) or the bound peptide (and 

30 sequence its amino acids directly). The former recovery 
method is preferred, as the encoding DNA may be amplified 
in vitro using PCR or in vivo by transf ecting suitable 
host cells with the high affinity display phage. While 
the production of fusion proteins with cleavable linkers 
35 is known in the art, the use of such linkers to 
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facilitate controlled cleavage of a chimeric coat protein 
of a phage has not previously been reported. 

Another method of addressing irreversible binding is 
appropriate when the binding domains are "mini -proteins " , 
5 i.e., relatively small peptides whose stability of 
structure primarily attributable to the presence of one 
or more covalent crosslinks, e.g., disulfide bonds. As 
in the example above , low affinity phage are removed 
first. The remaining, high affinity, bound phage are 
10 then treated with a reagent, which breaks the crosslink 7 
such as di thiothreitbl in the case o£ a domaih r with 
" diWii^id£r Steiods l^^^ois :££>tr cleave peptid^l BeSi&s " x$r- 
modify thai side ^chains of amino acids which are not 
crosslinked. This will usually result in sufficient 
15 denaturation to either release the phage outright or to 
permit their elution by other means. 

These two methods, of course, are not mutually 
- exclusive. 

In the previously" known epitope display phage 

20 libraries, the phage genome was altered by re placing the 
gene encoding the wild- type gene III protein of M13 with 
one encoding a chimeric coat protein. As a result, the 
five normal copies of the wild type gene III protein were 
all replaced by the chimeric coat protein, whereby each 

25 phage had five potential binding sites for the target, 
and hence a very high potential avidity. With high 
affinity epitopes (or binding domains) , this might well 
contribute to irreversible binding. 

One method of the present invention of reducing the 

3 0 avidity of display phage , especially epitope display 
.phage, for their target, and hence of alleviating the 
problem of irreversible binding, is to engineer the phage 
to contain two genes that each express a coat protein, 
one encoding the wild type coat protein, and the other 

35 the cognate chimeric coat protein. Thus, phage bearing 
identical epitopes or binding domains may yet bear 
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different ratios of wild type to chimeric coat protein 
molecules, and hence have different avidities . The 
average ratio for the library will be dependent on the 
relative levels of expression of the two cognate genes. 
5 It may be advantageous to be able to modulate the 

ratio of the chimeric coat protein to its cognate wild- 
type coat protein. For example, early in the 
evolutionary process, the affinity of the binding domains 
for their target may be rather low, especially if they 

10 ajre „ basedL on a ^ parental binding domain - which > has no- 
affinity for the target . - - .... . 

Modulation; may be achieved, by., placing, the chimeric 
gene under the control of a reoulatable promoter. 

While it may be possible to place the cognate wild- 

15 type gene under the control of a second, differently 
regulated promoter, this may be impracticible if, as with 
the M13 genelll, the gene is part of a polycistronic 
operon . In this case,- .^resfi;^^ ^ 
may be Reduced by replacing its . methionine initiation 

20 codon with. a .leucine initiation codon. : 

Ml 3 gene III, as previously noted, encodes one of 
the minor coat proteins of this filamentous phage (five 
copies per phage). In view of the difficulties with 
irreversible binding reported by those modifying this 

25 gene so that a foreign epitope is displayed on the phage 
coat, use of the M13 ynaj or coat protein was clearly 
discouraged. However, we have found that chimeric major 
coat proteins are in fact useful for displaying potential 
binding domains for screening purposes even though 

30 (indeed, sometimes because) there are over a thousand 
copies of this protein per phage. It is believed that 
the major (VIII) coat protein would likewise be useful in 
constructing an epitope phage library. 

We have also developed a linker suitable for 

35 attaching potential binding domains (or epitopic 
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peptides) to this major coat protein, and perhaps to 
other proteins as well. 

Finally, to the extent that some of the problems 
experienced with epitope libraries have been attributable 
5 to the use of patterns of mutagenesis which lead to 
highly biased allocations of amino acids, the present 
invention is also directed to a variety of improved 
patterns that lead to less biased and hence more 
efficient epitope phage libraries. 

10 „ V 

BRIEF DESCRIPTION OF nPW QJGS 
•^ ^iOTrgl '"'l? ^Mbw«' r ' : hov ^phage^ ifiay bei used as ar ^erietifc 
phage . At Ca ) we have a wild- type precoat protein lodged 
in the lipid bilayer. The signal peptide is in the 

15 periplasmic space. At (b) , a chimeric precoat protein, 
with a potential binding domain interposed between the 
signal peptide and the mature coat protein sequence, is 
similarly trapped. At (c) and (d) , the signal peptide 
has been cleaved off the wild- type and chimeric proteins, 

20 respectively, but certain residues of the coat protein 
sequence interact with the lipid bilayer to prevent the 
mature protein from passing entirely into the periplasm. 
At (e) and (f ) , mature wild- type and chimeric protein are 
assembled into the coat of a single stranded DNA phage as 

25 it emerges into the periplasmic space. The phage will 
pass through the outer membrane into the medium where it 
can be recovered and chromatographically evaluated. 
Figure 2 shows the of the coat protein of phage fl. 

30 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

I. DISPLAY STRATEGY 

A. General Considerations 

The present invention contemplates that a potential 
binding domain (pbd) or a potential epitope will be dis- 
35 played on the surface of a phage in the form of a fusion 
with a coat (outer surface) protein (OSP) of the phage. 
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This chimeric outer surface protein is the processed 
product of the polypeptide expressed by an display gene 
inserted into the phage genome; therefore: 1) the genome 
of the phage must allow introduction of the display gene 
5 either by tolerating additional genetic material or by 
having replaceable genetic material; 2) the virion must 
be capable of packaging the genome after accepting the 
insertion or substitution of genetic material, and 3) the 
display of the OSP-IPBD protein on the phage surface must 

10 not disrupt virion st^cture_ 
with phage propagation. 

When the viral j^rt^^ its coat 

* proteins may attach themselves to the phage: a) from the 
cytoplasm, b) from the periplasm/ or c) from within the 

15 lipid bilayer. The immediate expression product of the 
display gene must feature, at its amino terminal, a 
functional secretion signal peptide, such as the e1iq& 
_ signal. (MKQST^ coat protein 

attaches , to the phage, from the periplasm or from within 

20 the lipid bilayer. If a secretion signal is necessary 
for the display of the potential binding domain, in an 
especially preferred embodiment the bacterial cell in 
which the hybrid gene is expressed is of a "secretion- 
permissive" strain. 

25 The DNA sequence encoding the foreign epitope or 

binding domain should precede the sequence encoding the 
coat protein proper if the amino ter m inal of the 
processed coat protein is normally its free end, and 
should follow it if the carboxy terminal is the normal 

30 free end. 

The morphogenetic pathway of the phage determines 
the environment in which the IPBD will have opportunity 
to fold. Periplasmically assembled phage are preferred 
when IPBDs contain essential disulfides, as such IPBDs 

35 may not fold within a cell (these proteins may fold after 
the phage is released from the cell) . Intracellularly 
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assembled phage are preferred when the IPBD needs large 
or insoluble prosthetic groups (such as Fe 4 S 4 clusters) , 
since the IPBD may not fold if secreted because the 
prosthetic group is lacking* 
5 When variegation is introduced, multiple infections 

could generate hybrid GPs that carry the gene for one PBD 
but have at least some copies of a different PBD on their 
surfaces; it is preferable to minimize this possibility 
by infecting cells with phage under conditions resulting 
10 in a low multiple- of -infection (MOI) . 

For a given bacteriophage, thel preferred OSP is 
^^usuaily.- .pne^ that ? is -present ...on - the phage surface in^the 
-largest number of copies, as this allows the greatest, 
f lexibility in varying the ratio of OSP- IPBD to wild type 
15 OSP and also gives the highest likelihood of obtaining 
satisfactory affinity separation. Moreover, a protein 
present in only r one or a few copies usually performs an 
essential function in morphogenesis or infection; 
mutating such a protein by addition or insertion is 
20 likely to result in. reduction in viability of the GP. 
Nevertheless, an OSP such as M13 gill protein may be an 
excellent choice as OSP to cause display of the PBD. 

It is preferred that the wild- type osp gene be pre- 
served. The ipbd gene fragment may be inserted either 
25 into a second copy of the recipient osp gene or into a 
novel engineered osp gene. It is preferred that the osp- 
i pbd gene be placed under control of a regulated 
promoter. Our process forces the evolution of the PBDs 
derived from IPBD so that some of them develop a novel 
30 function, viz. binding to a chosen target. Placing the 
gene that is subject to evolution on a duplicate gene is 
an imitation of the widely- accepted scenario for the 
evolution of protein families. It is now generally 
accepted that gene duplication is the first step in the 
35 evolution of a protein family from an ancestral protein. 
By having two copies of a gene , the affected 
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physiological process can tolerate mutations in one of 
the genes. This process is well understood and 
documented for the globin family ( cf. DICX83, p65ff , and 
CRBI84, pll7-125) • 
5 The user must choose a site in the candidate OSP 

gene for inserting a ipbd gene fragment. The coats of 
most bacteriophage are highly ordered. Filamentous phage 
can be described by a helical lattice; isometric phage, 
by an icosahedral lattice. Each monomer of each major 

10 coat protein sits on a lattice point and makes defined 
interactions with each of its neighbors. Proteins that 
fit in£o the lattice by making some, but not all, of the 
" ^ IdtfrniS^ destabilize^ tfie 1 * 

virioiS'by : af SLb6r ting f orna of the virion, IBP making 

is the vitibix uifltable, or c) leaving gaps in the Viirioii sb 
that the nucleic acid is not protected. Thus in 
bacteriophage, it is important to retain in engineered 
OSP- IPBD fusion proteins those residues of the parental 
* OSP that" interact with other proteins in the virion. For 

20 Mi 3 gVIII, we prefer to retain the entire mature protein r 
while for M13 gill, it might suffice to retain the last 
100 residue^ (or eVen fewer). Such a truncated gill" 
protein would be expressed in parallel with the complete 
gill protein, as gill protein is required for phage 

25 infectivity. 

The display gene is placed downstream of a known 
promoter, preferably a regulated promoter such as 3,agPV5* 
tac . or 

Filamentous Phages 

30 The filamentous phages, which include M13, fl, fd, 

Ifl, Ike, Xf, Pfl, and Pf3, are of particular interest. 
The entire life cycle of the filamentous phage M13, a 
common cloning and sequencing vector, is well understood. 
The genetic structure (SCHA78) of M13 is well known as is 

35 the physical structure of the virion (BANN81, BOEK80, 
CHAN79, IT0K79, KAPL78 , KUHN85b, KUHN87, MAKO80, MARV78, 
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MESS78, 0HKA81, RASC86, RUSS81, SCHA78, SMIT85, WEBS78 , 
and 2IMM82) ; see RASC86 for a recent review of the 
structure and function of the coat proteins . 

Marvin and collaborators (MARV78, M&KO80, BANN81) 
5 have determined an approximate 3D virion structure of the 
closely related phage fl by a combination of genetics, 
biochemistry, and X-ray diffraction from fibers of the 
virus. Figure 2 is drawn after the model of Banner fit 
^1 . (BMW81) jand^ shows only the C a s of the protein. The 

10 apparent holes in the cylindrical sheath are actually 
filled by protein side groups so that the DNA within is 
protected . The amino terminus of each protein monomer is 
to the out s ide of the cyl inder , while the carboxy 
terminus is at smaller radius, near the DNA. Although 

15 other filamentous phages ( e.g. Pfl or Ike) have different 
helical symmetry, all have coats composed of many short 
a-helical monomers with the amino terminus of each 
monomer on the virion surface. 

1. M13 Major Coat Protein (gVIII) 

20 The major coat protein of M13 is encoded by gene 

VIII. The 50 amino acid mature gene VIII coat protein is 
synthesized as a 73 amino acid precoat (SCHA78; IT0K79) . 
The first 23 amino acids constitute a typical signal - 
sequence which causes the nascent polypeptide to be 

25 inserted into the inner cell membrane. Whether the 
precoat inserts into the membrane by itself or through 
the action of host secretion components, such as SecA and 
SecY, remains controversial , but has no effect on the 
operation of the present invention. 

30 An coli signal peptidase (SP-I) recognizes amino 

acids 18, 21, and 23, and, to a lesser extent, residue 
22, and cuts between residues 23 and 24 of the precoat 
(KUHN85a, KUHN85b, OLIV87) . After removal of the signal 
sequence , the amino terminus of the mature coat is 

35 located on the periplasmic side of the inner membrane; 
the carboxy terminus is on the cytoplasmic side. About 
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3000 copies of the mature 50 amino acid coat protein 
associate side-by- side in the inner membrane. 

We have constructed a tripartite gene comprising: 

1) DNA encoding a signal sequence directing secretion 
5 of parts (2) and (3) through the inner membrane, 

2) DNA encoding the mature BPTI sequence, and 

3) DNA encoding the mature M13 gVIII protein. 

This gene causes BPTI to appear in active form on the 
surface of M13 phage. 

10 2. M13 Minor Coat Proteins, Generally 

An introduced binding domain or epitope may also be, 
displayed on a-, filamentous ptegev. ae . a .portion of 
chimeric minor coat protein. These are encoded by genes .. 
Ill, VI, VII, and IX," and each is _ present in about 5 

15 copies per virion and is related to morphogenesis or 
infection. In contrast, the major coat protein is 
present in more than 2500 copies per virion. The gene 
- Ill, VI, VII, and IX proteins are present at the ends of 
the virion. _ : ... - 

20 3. The M13 gill Minor Coat Protein 

The single- stranded circular phage DNA associates with 
about five copies of the gene III protein and is then 
extruded through the patch of membrane -associated coat 
protein in such a way that the DNA is encased in a 

25 helical sheath of protein (WEBS78) . The DNA does not 
base pair (that would impose severe restrictions on the 
virus genome) ; rather the bases intercalate with each 
other independent of sequence. 

Smith (SMIT85) and de la Cruz fit SlLl. (DEIA88) have 

30 shown that insertions into gene ill cause novel protein 
domains to appear on the virion outer surface. The mini- 
protein's gene may be fused to gene HI at the site used 
by Smith and by de la Cruz fit al . . at a codon correspond- 
ing to another domain boundary or to a surface loop of 

35 the protein, or to the amino terminus of the mature 
protein. 
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All published works use a vector containing a single 
modified gene XII of fd. Thus, all five copies of gill 
are identically modified. Gene HI is quite large (1272 
b.p. or about 20% of the phage genome) and it is 
5 uncertain whether a duplicate of the whole gene can be 
stably inserted into the phage. Furthermore, all five 
copies of gill protein are at one end of the virion. 
When bivalent target molecules (such as antibodies) bind 
a pen t aval ent phage , the resulting complex may be 
10 ^r^BTBiiile. Irreversible binding of the phage to the 
target greatly interferes with affinity enrichment of the 
GPs _ that jcar ry t lie genetiq _segy.ences encoding the novel 
polypeptide having the highest affinity for the target. 
To _ reduce the likelihood ... of formation of 
15 irreversible complexes, we may use a second, synthetic 
gene that encodes only the carboxy- terminal portion of 
III * We might, for example, engineer a gene that 
comprises (from 5* to*3 1 ) : 

1) a promoter (preferably regulated) , 
20 2) a ribosome -binding site, 
3) an initiation codpn, 

4} a sequence encoding a functional signal peptide 
directing secretion of parts (5) and (6) through the 
inner membrane, 
25 5) DNA encoding a potential binding domain, 

6) DNA encoding residues 275 through 424 of M13 gill 
protein, 

7} a translation stop codon, and 
8) (optionally) a transcription stop signal. 
30 Note that in the gill protein, the amino terminal 

moiety is responsible for pilus binding (i.e., for 
infectivity) and the carboxy terminal moiety for 
packaging, so that the chimeric gill protein described 
above is able to assemble into the viral coat, but does 
35 not contribute to infectivity. 
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We leave the wild- type gene III so that some 
unaltered gene III protein will be present. 

Thus, the hybrid gene may comprise DNA encoding a 
potential binding domain operably linked to a signal 
5 sequence ( e.g. . the signal sequences of the bacterial 
phoA or bla genes or the signal sequence of M13 phage 
genelll ) and to DNA encoding at least a functional 
portion of a coat protein ( e.g. « the M13 gene III or gene 
VIII proteins) of a filamentous phage (e.g. . M13) . The 
10 expression product- is transported to the .inner membrane ~. 
(lipid bilayer) of the host cell, whereupon the signal, 
, v .. peptide , ; is ; cle^^ a. processed ^ 

" "proteinf ite C- terminus of the coat protein- like cam- 
ponent of this hybrid protein is trapped in the lipid 
15 bilayer, so that the hybrid protein does not escape into 
the periplasmic space. (This is typical of the wild- type 
coat protein.) As the single -stranded DNA of the nascent 
phages paf ticle ; passes ^ into, thei,, periplasmic space, it 
collects both wild- type coat protein and the hybrid 
,20 protein from the lipid bilayer. The hybrid protein is 
thus phaged into the surface sheath of the filamentous 
phage, leaving the potential binding domain exposed on 
its outer surface. 

4. Coat Proteins of Pf 3 
25 Similar constructions could be made with othex 

filamentous phage. Pf3 is a well known filamentous phage 
that infects Pseudomonas aeruoenofia cells that harbor an 
IncP-1 plasmid. The entire genome has been sequenced 
(IjUIT85) and the genetic signals involved in replication 
30 and assembly are known (LUIT87) . The major coat protein 
of PF3 is unusual in having no signal peptide to direct 
its secretion. The sequence has charged residues ASP 7 , 
ARG^, LYS40, and PHE44- COO" which is consistent with the 
amino terminus being exposed. Thus, to cause an IPBD to 
35 appear on the surface of Pf3, we construct a tripartite 
gene comprising: 
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1) a signal sequence known to cause secretion in P. 
freryqenqga (preferably known to cause secretion of 
IPBD) fused in- frame to, 

2) a gene fragment encoding the IPBD sequence, fused 
5 in-frame to, 

3) DNA encoding the mature Pf3 coat protein/. 
Optionally, DNA encoding a flexible linker of one to 10 
amino acids is introduced between the ipbd gene fragment 
and the Pf3 coat -proton gene. Optionally, DNA encoding 

10 the recognition site for a specif ic protease, suchr as 

tissue, plasminogen activator or bipod clotting Factor Xa,^ 

^ ?Qat -pyptg^,gyene ,. Amino acids that form the recognition- ^ 
site for a specific protease may also serve the function 
15 of a flexible linker. This tripartite gene is introduced 
into Pf3 so that it does not interfere with expression of 
any Pf3 genes. To reduce the possibility of genetic 
recombination, part (3) is designed to have numerous ^ 
silent imitations relative to the wild- type gene. Once _ 
20 the signal sequence is cleaved off, the IPBD is in the 
periplasm and the mature coat protein acts as an anchor 
and phage-assembly signal. It does not matter that this 
fusion protein comes to rest in the lipid bilayer by a 
route different from the route followed by the wild- type 
25 coat protein. 

gene CgBBtryction 

The structural coding sequence of the display gene 
encodes a chimeric coat protein and " any required 
secretion signal. A "chimeric coat protein" is a fusion * 
30 of a first amino acid sequence (essentially corresponding 

to at least a functional portion of a phage coat protein) * 
with a second amino acid sequence, e.g., a domain foreign 
to and not substantially homologous with any domain of 
the first protein. A chimeric protein may present a 
35 foreign domain which is found (albeit in a different 
protein) in an organism which also expresses the first 
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protein, or it may be an "interspecies", "intergeneric" , 
etc. fusion of protein structures expressed by different 
kinds of organisms. The foreign domain may appear at the 
amino or carboxy terminal of the first amino acid 
5 sequence (with or without an intervening spacer) , or it 
may interrupt the first amino acid sequence. The first 
amino acid sequence may correspond exactly to a surface 
protein of the phage, or it may be modified, e.g. . to 
facilitate the display of the binding domain. 

10 A preferred . sAte„.f or-.-4naetr-tion»of,,-the ipbd gene into 

the phage osp gene is one .. in which: a.) the IPBD : folds 
into its original shape, b) the .OSS/, domains fold into 
their original shapes, and c) there is no interference 
between the two domains. 

15 If there is a model of the phage that indicates that 

either the amino or carboxy terminus of an OSP is exposed 
to solvent, then the exposed terminus of that mature OSP 
_ r becomes; the -prime^ of the? y ipbd 

gene. A low resolution 3D model suffices. 

20 _ . . In _ the ..absence of a; 3D structure, the amino and 
carboxy termini of the mature OSP are the best candidates 
for insertion of the i pbd gene. A functional fusion may 
require additional residues between the IPBD and OSP 
domains to avoid unwanted interactions between the 

25 domains. Random- sequence DNA or DNA coding for a 
specific sequence of a protein homologous to the IPBD or 
OSP, can be inserted between the osp fragment and the 
ipbd fragment if needed. 

Fusion at a domain boundary within the OSP is also 

30 a good approach for obtaining a functional fusion. Smith 
exploited such a boundary when sub cloning heterologous 
DNA into gene HI of fl (SMIT85) . 

The criteria for identifying OSP domains suitable 
for causing display of an IPBD are somewhat different 

35 from those used to identify and IPBD. When identifying 
an OSP, minimal size is not so important because the OSP 
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domain will not appear in the final binding molecule nor 
will we need to synthesize the gene repeatedly in each 
variegation round. The major design concerns are that: 
a) the OSP:: IPBD fusion causes display of IPBD, b) the 
5 initial genetic construction be reasonably convenient, 
and c) the os p ; ; ipbd gene be genetically stable and 
easily manipulated. There are several methods of 
identifying domains. Methods that rely on atomic 
coordinates have been reviewed by Janin and Chothia 
"IcT ( JANT85 ) . ^ : ^€BB~'^m^1^x^^"^€^ matrices of distances 
between a carbons Tc«1 , dividing planes (££. ROSE8S ) 7 or 

cdrrelaLted the behavior of many natural proteins with 
doniain structure (according to their definition) . Rashin 

15 correctly predicted the stability of a domain comprising 
residues 206-316 of thermolysin (VITA84, RASH84) . 

Many researchers have used partial proteolysis and 
protein sequence analysis to isolate and identify stable 
domain^ (See, for example , VITA84 , POTE83, SC0T87a, and 

20 PAB079 . ) Pabo et al. used calorimetry as an indicator 
that the cl repressor from the coliphage )\ contains tw6 
domains; they then used partial proteolysis to determine 
the location of the domain boundary. 

If the only structural information available is the 

25 amino acid sequence of the candidate OSP, we can use the 
sequence to predict turns and loops. There is a high 
probability that some of the loops and turns will be 
correctly predicted ( cf . Chou and Fasman, (CHOU74) ) ; 
these locations are also candidates for insertion of the 

30 i pbd gene fragment. 

Fusing one or more new domains to a protein may make 
the ability of the new protein to be exported from the 
cell different from the ability of the parental protein. 
The signal peptide of the wild- type coat protein may 

35 function for authentic polypeptide but be un able to 
direct export of a fusion. To utilize the Sec-dependent 
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pathway, one may need a different signal peptide. Thus, 
to express and display a chimeric BPTI/M13 gene VIII 
protein, we found it necessary to utilize a heterologous 
signal peptide (that of ph&&) • 
5 Phage that display peptides having high affinity for 

the target may be quite difficult to elute from the 
target, particularly a multivalent target. One can 
introduce a cleavage site for a specific protease, such 
as blood- clotting Factor Xa, into the fusion protein so 

10 that the binding- domain ;can be^ cleaved from the genetic 
- .package. Such ^cleavage has the advantage that all 
:-^^3PM9^]^i^^ ^r^ 51 ^ a .i^?^a£^?r?i ; ; .:cQat ;; : . pro t e i|is ^vanqU 

therefpre.^ ey en - if polypftptide^- 
displaying -phatge can be. elute4 from the af f init^iutrix. 

15 without cleavage. This step allows recovery of valuable 
genes which might otherwise be lost. To our knowledge, 
no one has disclosed or suggested using a specific 
protease as a means to recover an information- containing 
......... genetic .paqkage or of converting a population pf. phage r 

20 that vary in infectivity into phage haying identical 
inf ectivity. . 

There exist a number of highly specific proteases. 
While the invention does not reside in the choice of any 
particular protease, the protease is preferably 

25 sufficiently specific so that under the cleavage 
conditions, it will cleave the linker but not any 
polypeptide essential to the viability of the phage, or 
.(save in rare cases) the potential epitope/binding 
domain. It is possible that choice of particular 

30 cleavage conditions, e.g., low temperature, may make it 
feasible to use a protease that would otherwise be 
unsuitable. 

The blood- clotting and complementation systeans 
contains a number of very specific proteases. Usually, 
35 the enzymes at the early stages of a cascade are more 
specific than are the later ones. For example, Factor X, 
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(F.X.) is more specific than is thrombin (cp. Table 10-2 
of COLM87) . Bovine F.X, cleaves after the sequence Ile- 
Glu-Gly-Arg while human F.X, cleaves after Ile-Asp-Gly- 
Arg. Either protease -linker pair may be used, as 
5 desired. If thrombin is used, the most preferred 
thrombin- sensitive linkers are those found in fibrinogen, 
Factor XXIX, and prothrombin. Preferably, one would take 
the linker sequence from the species from which the 
^ exanple, _if bovine thrombin is 

10 _ to be useci, then one uses a linker taken from bovine 
fibrinogen or bovine F.XIII. 

Human Factor n, cleaves human Fa&tbiT IX T at s tW6 
places (C0LM87, p. 42) : - - 

Q T S K !» T R m / A E A V F and 
15 S F N D F T R irf / V V G G E. 

Thus one could incorporate either of these sequences 
v . (especially the unders cored portions) as a linker between 
the PBD and the GP- surface -anchor domain (GPS AD) and use 

human F.XI a to, cleave them. _ . r 7? 

20 Human kallikrein cuts human F.XXX at R 333 (COLM87, 

p. 258): 

L F S S M T R ? ~/ V V G G L V. 
This sequence has significant similarity to the hF.XI a 
sites above. One could incorporate the sequence SSMTRWG 
25 as a linker between PBD and GPS AD and cleave PBD from the 
GP with human kallikrein. 

Human F.XXX, cuts human F.XX at (C0LM87, p. 256) : 

UKP RW I v Q G T. 
One could incorporate XXKPRXVG as a linker between PBD 
30 and GPSAD. PBD could then be cleaved from GP with 
hF.XII,. 

Other proteases that have been used to cleave fusion 
proteins include enterokinase, collagenase, chymosin, 
urokinase, renin, and certain signal peptidases. See 
35 Rutter, US 4,769,326. 
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When a protease inhibitor is sought, the target 
protease and other proteases having similar substrate 
specificity are not preferred for cleaving the PBD from 
the GP. It is preferred that a linker resembling the 
5 substrate of the target protease nfl£ be incorporated 
anywhere on the display phage because this could make 
separation of excellent binders from the rest of the 
population needlessly more difficult. 

If there is steric hindrance of the site -specific 
~ 10 ^Gl eavage ^of the linker; the linker may be modified So 
• that the cleavage site "is more exposed, by 
^ ^ r^gl^Gihefi^^ff br^ < ^add^feaibnal f leixibil:i^yT^^o^ 

: :; prolines- (f or- maximums relongat ion) ^between the cleavage 
, v site and the; bulk of the protein GUAN91 in^roved 
15 thrombin cleavage of a GST fusion protein by introducing 
a glycine -rich linker (PGISGGGGG) immediately after the 
thrombin cleavage site (LVPRGS) . A suitable linker may 
- also be identified; byrArariegation-suid-Belection^ 

The sequences of : regulatory .parts of the gene are 
20 taken from the" sequences of Natural regulatory elements: 
a) promoters, b) Shine -Dal garno sequences, and c) -trans- 
criptional terminators. Regulatory elements could also 
be designed from knowledge of consensus sequences of 
natural regulatory regions* The sequences of these 
25 regulatory elements are connected to the coding regions; 
restriction sites are also inserted in or adjacent to the 
regulatory regions to allow convenient manipulation. 

The essential function of the affinity separation is 
to separate GPs that bear PBDs (derived from IPBD) having 
30 high affinity for the target from GPs bearing PBDs having 
low affinity for the target. If the elution volume of a 
phage depends on the number of PBDs on the phage surface, 
then a phage bearing many PBDs with low affinity, 
GP(PBD W ), might co-elute with a phage bearing fewer PBDs 
35 with high affinity, GP(PBD,). Regulation of the display 
gene preferably is such that most phage display 
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sufficient PBD to effect a good separation according to 
affinity. Use of a regulatable promoter to control the 
level of expression of the display gene allows fine 
adjustment of the chromatographic behavior of the 
5 variegated population. 

Induction of synthesis of engineered genes in 
vegetative bacterial cells has been exercised through the 
use of regulated promoters such as lacUVS . trp?, or tag 
(MANI82) . The factors that regulate the quantity of 

10 protein synthesized are sufficiently well understood that 
a wide variety of heterologous proteins can now be 
produced in E . ^cbli. B . subtil is and other host delist : in 
at least moderate quantities (SKER88, BETT88) • 
Preferably, the promoter for the display gene is subject 

15 to regulation by a small chemical inducer. For exaxrple, 
the lac promoter and the hybrid trp - lac Cfcac) promoter 
are reg^atable ,with isopropyl thiogalactoside (IPTG) - 
The promoter for the constructed gene need not come from 
a natural osp gene; any regulatable promoter functional 

20 in bacteria can be used* A non-leaky promoter is 
preferred. 

The coding portions of genes to be synthesized are 
designed at the protein level and then encoded in DNA. 
While the primary consideration in devising the DNA 

25 sequence is obtaining the desired diverse population of 
potential binding domains (or epitopes) , consideration is 
also given to providing restriction sites to facilitate 
further gene manipulation, minimizing the potential for 
recombination and spontaneous mutation, and achieving 

30 efficient translation in the chosen host cells. 

The present invention is not limited to any parti- 
cular method of DNA synthesis or construction. Conven- 
tional DNA synthesizers may be used, with appropriate 
reagent modifications for production of variegated DNA 

35 (similar to that now used for production of mixed 
probes} . 
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The phage are genetically engineered and then 
transfected into host cells, e.g., coli. B. subtilis. 
or P. aeruginosa, suitable for antplif ication. The 
present invention is not limited to any one method of 
5 transforming cells with DNA or to any particular host 
cells. 

THE INITIAL POTENTIAL BINDING DOMAIN (IPBD) : 

By virtue of the present invention, proteins may be 
obtained which can bind specifically to targets other 
^^, ^^^0 than the -antigen -combining sites of antibodies r For" tHlP^ 

..... purposes of the~appended claims , a protein P is a binding r — 

^ ^ r / ^-:,EEfit£in,tM:^ the -variable 

_ :Q f an antibody, the? dissociation cons tan t^ (P>A), < 10"^^ 

moles/liter (pref erably , < 10' 7 moles/liter) i The - ^ 

15 exclusion of "variable domain of an antibody" is intended 
to make clear that for the purposes herein a protein is 
not to be considered a "binding .protein" merely because 
it is antigenic i \c "However; an antigen may ^ 
qualify as a binding protein because ^ it specifically 
20 binds to a substance other than an antibody/ e.g.. ah 
enzyme for its substrate, or a hormone for its cellular^ 
receptor. Additionally, it should be pointed out that 
"binding protein" may include a protein which binds 
specifically to the Fc of an antibody, e,q,, 
25 staphylococcal protein A. 

While the present invention may be used to develop 
novel antibodies through variegation of codons 
corresponding to the hypervariable region of an 
antibody's variable domain, its primary utility resides 
30 in the development of binding proteins which are not 
antibodies or even variable domains of antibodies. Novel 
antibodies can be obtained by immunological techniques; 
novel enzymes, hormones, etc. cannot. 

Most larger proteins fold into distinguishable 
35 globules called domains. The display strategy is first 
perfected by modifying a genetic phage to display a 
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stable, structured domain (the " initial potenti al binding 
domain ■ . IPBD) for which an affinity molecule (which may 
be an antibody) is obtainable. The success of the 
modifications is readily measured by, e.q- , determining 
5 whether the modified genetic phage binds to the affinity 
molecule . For the purpose of identifying IPBDs , 
definitions of "domain" which emphasize stability — 
retention of the overall structure in the face of 
perturbing forces such as elevated temperatures or 

10 chaotropic agents are favored, though atomic coor- 
dinates and protein sequence homology are not completely 
ignored- When a domain^ : ^6^^^§^6tBi^i is primarily^ 
responsible ^fbr the protein's ability to specifically 
bind a chosen target, it is referred to herein as a 

15 "binding domain" (BD) . 

The IPBD is chosen with a view to its tolerance for 
... extensive mutagenesis* Once it is known that the IPBD 
can be displayed on a' surf ace of a phage and subjected to 
affinity selection, the gene ^encoding the IPBD is sub- 

20 jected to a special pattern of multiple mutagenesis, here 
termed " variegation " . which after appropriate cloning and 
amplification steps leads to the production of a popula- 
tion of phage each of which displays a single potential 
binding domain (a mutant of the IPBD) , but which 

25 collectively display a multitude of different though 
structurally related potential binding domains (PBDs). 
Each genetic phage carries the version of the obd gene 
that encodes the PBD displayed on the surface of that 
particular phage. Affinity selection is then used to 

3 0 identify the display phage bearing the PBDs with the 
desired binding characteristics, and these display phage 
may then be amplified. After one or more cycles of 
enrichment by affinity selection and amplification, the 
DNA encoding the successful binding domains (SBDs) may 

35 then be recovered from selected phage. 
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If need be, the DNA from the SBD-bearing phage may 
then be further "variegated", using an SBD of the last 
round of variegation as the "parental potential binding 
domain" (PPBD) to the next generation of PBDs, and the 
5 process continued until the worker in the art is 
satisfied with the result. At that point, the SBD may be 
produced by any conventional means, including chemical 
synthesis. 

The initial potential binding domain may be: 1) a 
10 ^domain of a naturally . occurring ^protein r 2^a : no^natur- 
r ally, occurring domain „ which ^s^stantially corresponds in 
_ sequence ^ to, t ^ Jtpt^- -which^ 

differs . f romJLt in se^ence . by one or more substitutions,^, 
insertions or deletions, 3) r ja. domain- substantially^ 
15 corresponding in sequence to a hybrid of subsequences of 
two or more naturally occurring proteins, or 4) an 
artificial domain designed entirely on theoretical; 
grounds based on knowledge of amino acid geometries -and- 
statistical evidence gf secondary structure preferences, 
~ 20 of amino acids. (However, the limitations of priori- 
, protein design prompted the present invention.) Usually, 
the domain will be a known binding domain, or at least a 
homologue thereof, but it may be derived from a protein 
which, while not possessing a known binding activity, 
25 possesses a secondary or higher structure that lends 
itself to binding activity (clefts, grooves, ££slJ . The 
protein to which the IPBD is related need not have any 
specific affinity for the target material. 

in determining whether sequences should be deemed to 
30 "substantially correspond", one should consider the 
following issues: the degree of sequence similarity when 
the sequences are aligned for best fit according to 
standard algorithms, the similarity in the connectivity 
patterns of any crosslinks ( e.g. , disulfide bonds), the 
35 degree to which the proteins have similar three-dimen- 
sional structures, as indicated by, e.g. . X-ray diffrac- 
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tion analysis or NMR, and the degree to which the se 
quenced proteins have similar biological activity. In 
this context, it should be noted that among the serine 
protease inhibitors , there are families of proteins 
5 recognized to be homologous in which there are pairs of 
members with as little as 30% sequence homology. 

A candidate 1PBD should meet the following criteria: 
1) a domain exists that will remain stable under the 
conditions of its intended use (the domain may 
10 comprise the entire protein that will be inserted, 

e.g. BPTI, a-cohotoxin GI, or CMTI-III) , 
> ■ r* y . 21 Kiio^t^g^ acid -sequence is .^^ai^r^^;:^ 

r "able^ and " \ 

: ^ 3} a molecule is obtainable having specif ic and high 

15 affinity for the IPBD, AfM(IPBD) . 

Preferably, in order to guide the variegation strategy, 
- knowledge .of , the identity of the residues on the domain's , 
outer surface, -and their spatial relationships, is 
: obtain^Terho^evfer, this consideration's less important 
20 if the binding domain is small, e.g. . under 40. residues. 

Preferably, the IPBD is no larger than necessary- 
because small SBDs (for example, less than 40 amino 
acids) can be chemically synthesized and because it is 
easier to arrange restriction sites in smaller amino- acid 
25 sequences. For PBDs smaller than about 40 residues, an 
added advantage is that the entire variegated pbd gene 
can be synthesized in one piece. In that case, we need 
arrange only suitable restriction sites in the osp gene. 
A smaller protein minimizes the metabolic strain on the 
30 phage or the host of the GP. The IPBD is preferably 
smaller than about 200 residues. The IPBD must also be 
large enough to have acceptable binding affinity and 
specificity. For an IPBD lacking covalent crosslinks, 
such as disulfide bonds, the IPBD is preferably at least 
35 40 residues; it may be as small as six residues if it 
contains a crosslink. 
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There are many candidate IPBDs, for example, bovine 
pancreatic trypsin inhibitor (BPTI, 58 residues) , CMTI- 
III (29 residues) , crambin (46 residues) , third domain of 
ovomucoid (56 residues) , heat -stable enterotoxin (ST-Ia 
5 of L. coli ) (18 residues) , a-Conotoxin GI (13 residues) , 
/i-Conotoxin GUI (22 residues), Conus King Kong mini- 
protein (27 residues) , T4 lysozyme (164 residues) , and 
azurin (128 residues). Table 50 lists several preferred 
IPBDS. 

10 In some cases/ a protein having some affinity for 

the target may be a preferred IPBD even though some other 
cri^ria are the„ VI 

domain of CD4 is a good choice as IPBD for a-, protein that 
binds to gpl20 of HIV. It is Known that mutations in the 

15 region 42 to 55 of VI greatly affect gpl20 binding and 
that other mutations either have much less effect or 
conqpletely disrupt the structure of VI. Similarly, tumor 
" necrosis factor (TNF) would be a good initial choice if 
-„one ..wants a TNF-lik^ molecule having higher affinity for 

20 the TNF receptor. 

As even surf ace r mutations .may reduce the stability 
of the PBD, the chosen IPBD should have a high melting 
temperature (50°C acceptable, the higher the better; BPTI 
melts at 95*C.) and be stable over a wide pH range (8.0 

25 to 3.0 acceptable; 11.0 to 2.0 preferred), so that the 
SBDs derived from the chosen IPBD by mutation and 
selection- through-binding will retain sufficient stabil- 
ity. Preferably, the substitutions in the IPBD yielding 
the various PBDs do not reduce the melting point of the 

30 domain below ~40 # C. Mutations may arise that increase the 
stability of SBDs relative to the IPBD, but the process 
of the present invention does not depend upon this 
occurring. Proteins containing covalent crosslinks, such 
as multiple disulfides, are usually sufficiently stable. 

35 A protein having at least two disulfides and having at 
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least X disulfide for every twenty residues may be 
presumed to be sufficiently stable* 

Xf the target is a protein or other znacromolecule a 
preferred embodiment of the IPBD is a small protein such 
as the Cucyrbj.ta maxima trypsin inhibitor III (29 resi- 
dues) , BPTI from Bos Taurus (58 residues) , crambin from 
rape seed (46 residues), or the third domain of ovomucoid 
from GQXrVrni* CQtvrnjy flapppica (Japanese quail) (56 
residues) , because targets from this class have clefts 
and grooves that can accommodate small proteins in highly 
specific ways. If the target" is a macromolecule' lacking 
a compact structure , such as starch/ it should: be treated 
as if it were a small molecule. Extended macromolecul^es 
with defined 3D structure, such as collagen, should be 
treated as large molecules* 

If the target is a small molecule, such as a 
steroid, a preferred embodiment of the IPBD is a protein 
of about 80-200 residues, such as ribonuclease f rom Eflfi 
taurus (124 residues) , ribonuclease from Aspergillus " 
oruzae (104 residues), hen egg white lysozyme from Gallus 
callus (129 residues), azurin from Pseudomonas aeruaenosa 
(128 residues) , or T4 lysozyme (164 residues) , because 
such proteins have clefts and grooves into which the 
small target molecules can fit. The Brookhaven Protein 
Data Bank contains 3D structures for all of the proteins 
listed. Genes encoding proteins as large as T4 lysozyme 
can be manipulated by standard techniques for the 
purposes of this invention. 

If the target is a mineral, insoluble in water, one 
considers the nature of the molecular surface of the 
mineral. Minerals that have smooth surfaces, such as 
crystalline silicon, are best addressed with medium to 
large proteins, such as ribonuclease, as IPBD in order to 
have sufficient contact area and specificity. Minerals 
with rough, grooved surfaces, such as zeolites, could be 
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bound either by small proteins, such as BPTI, or larger 
proteins, such as T4 lysozyme. 

BPTI is an especially preferred IPBD because it 
meets or exceeds all the criteria: it is a small, very 
5 stable protein with a well known 3D structure. 

Small polypeptides have potential advantages over 
larger polypeptides when used as therapeutic or 
diagnostic agents, including (but not limited to) : 
a) better penetration into, tissues, 
.10 . b) faster- elimination, from the circulation (important > 
for imaging agents) ; - , r ~: -jr.-*"- 

d) l3gher^~^t^^ 

Thus , it,would be desirable tp be able to enqploy the : 
15 combination of variegation and affinity selection to 
identify small polypeptides which bind a target of 
choice. 

.* Polypeptides of -this size, however, have disadvanV 
tages as binding molecules . According, to Olivera SJi;;ali. 

20 (OLIVSOa) : "Peptides in this size range normally equi - 
librate among many conformations (in order to have a 
fixed conformation, proteins generally have to be much 
larger) . " Specific binding of a peptide to a target 
molecule requires the peptide to take up one conformation 

25 that is complementary to the binding site. 

In one embodiment, the present invention overcomes 
these problems* while retaining the advantages of smaller 
polypeptides, by fostering the biosynthesis of novel 
mini -proteins having the desired binding characteristics . 

30 Mini -Proteins are small polypeptides (usually less than 
about 60 residues, more preferably less than 40 residues 
( "micro-proteins ") ) which, while too small to have a 
stable conformation as a result of noncovalent forces 
alone, are ccvalently crosslinked <e.q- . by disulfide 

35 bonds) into a stable conformation and hence have 
biological activities more typical of larger protein 
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molecules than of unconstrained polypeptides of 
comparable size. 

When mini -proteins are variegated, the residues 
which are covalently crosslinked in the parental molecule 
5 are left unchanged, thereby stabilizing the conformation. 
For example, in the variegation of a disulfide bonded 
mini-protein, certain cysteines are invariant so that 
under the conditions of expression and display, covalent 
crosslinks ( e.g. . disulfide bonds between one or more 

10 " pairs of cysteines) form, and substantially constrain the 
conformation which may be adopted by the; hypervariable 
% >l±nearly^ , intenftediate ^sjx&xiq^^ 
constraining #e«tff^ polypeptides^ 
which are otherwise ^teripriyely randomized . 

15 Once a mini -protein of desired binding character- 

istics is characterized, it may be produced, not only by 
recombinant DNA techniques , but also by nonbiological 
synthetic methods. .-•";..., > -_ ; ^ • ; 

For the purpose of the appended claims, a „mini- 

20 protein has between about eight and about SO residues. 
An intrachain. disulf ide bridge .connecting ^ acids 3 
and 8 of a 16 residue polypeptide will be said herein to 
have a span of 4 . If amino acids 4 and 12 are also 
disulfide bonded, then their bridge has a span of 7. 

25 Together, the four cysteines divide the polypeptide into 
four intercysteine segments (1-2, 5-7, 9-11, and 13-16). 
(Note that there is no segment between Cys3 and Cys4.) 

The connectivity pattern of a crosslinked mini- 
protein is a simple description of the relative location 

30 of the termini of the crosslinks. For example, for a 
mini-protein with two disulfide bonds, the connectivity 
pattern n l-3, 2-4* means that the first crosslinked 
cysteine is disulfide bonded to the third crosslinked 
cysteine (in the primary sequence) , and the second to the 

35 fourth. 
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The variegated disulfide -bonded mini -proteins of the 
present invention fall into several classes. 

Class I mini -proteins are those featuring a single 
pair of cysteines capable of interacting to form a 
5 disulfide bond, said bond having a span of no more than 
nine residues* This disulfide bridge preferably has a 
span of at least two residues; this is a function of the 
geometry of the disulfide bond. When the spacing is two 
or three residues, one residue is preferably glycine in 

^—10 order to. reduce the strain on the bridged -residues * ^ The 

> upper limit- on spacing is less precise, however> in 

_ general the greater the spacing, the .less the constraint • 
on conformation imposed on the linearly intermediate 
amino acid residues by the disulfide bond. 
15 A disulfide bridge with a span of 4 or 5 is espe- 

cially preferred. If the span is increased to 6, the 
constraining influence is reduced. In this case, we 
.^ prefer. . that at least one of , the ^enclosed residues be an 
amino -acid that imposes restrictions on the main- chain 
20 .geometry. Proline imposes the most restriction. Valine 
and isoleucine restrict the main chain to a lesser 
extent. The preferred position for this constraining 
non- cysteine residue is adjacent to one of the invariant 
cysteines, however, it may be one of the other bridged 
25 residues. If the span is seven, we prefer to include two 
amino acids that limit main- chain conformation. These 
amino acids could be at any of the seven positions, but 
are preferably the two bridged residues that are 
immediately adjacent to the cysteines. If the span is 
30 eight or nine, additional constraining amino acids may be 
provided. 

Additional amino acids may appear on the amino side 
of the first cysteine or the carboxy side of the second 
cysteine. Only the immediately proximate "unspanned" 
35 amino acids are likely to have a significant effect on 
the conformation of the span. 
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glass II mini -proteins are those featuring a single 
disulfide bond having a span of greater than nine amino 
acids . The bridged amino acids form secondary structures 
which help to stabilize their conformation. Preferably, 
5 these intermediate amino acids form hairpin 
supersecondary structures such as those schematized 
below: 

, S — S 1 

- Cys - ahelix- turn- Sst rand- Cys - 

10 ' ~" i— -S— «— - — — -— i "'• 

" -Cys-aheiix-turh-^elix-Qrs- ■ ~" 

j"***— ' ^ ~ " ' 

15 m designing a suitable hairpin structure, one may 

copy an actual structure from a protein whose three- 
dimensional conformation is known, design the structure 
using secondary structure tendency data for the 
individual "amino "acids , etc., or combine -the two 

20 approaches. Preferably, one or more actual structures 
are used as a model, and the frequency data is used to 
determine which mutations can be made without disrupting 

the structure. 

Preferably, no more than three amino acids lie 
25 between the cysteine and the beginning or end of the a 
helix or £ strand. 

More complex structures (such as a double hairpin) 

are also possible. 

nasa IIT Tn-in-i -proteins are those featuring a 

30 plurality of disulfide bonds. They optionally may also 
feature secondary structures such as those discussed 
above with regard to Class II mini -proteins. Since the 
number of possible disulfide bond topologies increases 
rapidly with the number of bonds (two bonds, three 

35 topologies; three bonds, 15 topologies ? four bonds, 105 
topologies) the number of disulfide bonds preferably does 
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not exceed four. Two disulfide bond are preferable to 
three , and three to four* With two or more disulfide 
bonds, the disulfide bridge spans preferably do not 
exceed 30, and the largest intercysteine chain segment 
5 preferably does not exceed 20. 

Naturally occurring class III mini -proteins, such as 
heat-stable enterotoxin ST- la, frequently have pairs of 
cysteines that are clustered (-C-C- or -C-X-C-) in the 
amino -acid sequence. Clustering reduces the number of 
10 " "realizable topdidg^^^ adv^tagebus. 

MPtal Firiaer c Mini-gr6teins; The mini -proteins of 
^ J ? ndt limited to 'fchoseV 

cros si inked by disulfide bbnds ^ Another important class 
of mini -proteins are analogues of finger proteins. 
15 Finger proteins are characterized by finger structures in 
which a metal ion is coordinated by two Cys and two His 
residues, forming a tetrahedral arrangement around it-_ 
™ ^ ietal lSn but may Ibe ironV" 

copper, cobalt, etc . The "finger" has the consensus 
20 sequence Ifphe or Tyr) - (1 AA) -Cys- (2-4 AAs) -Cys- (3 A/Is) - 
Phe-(5 AAs) -Leu- (2 AAs) -His- (3 AAs)-His-(5 AAs) (BERG88; 
GIBS88) . The present invention encompasses mini -proteins 
with either one or two fingers. 

Further diversity may be introduced into a display phage 
25 library ofpotential binding domains by treating the phage 
with (preferably nontoxic) enzymes and/ or chemical 
reagents that can selectively modify certain side groups 
of proteins, and thereby affect the binding properties of 
the displayed PBDs. Using affinity separation methods, 
30 we enrich for the modified GPs that bind the 
predetermined target. Since the active binding domain is 
not entirely genetically specif ied, we must repeat the 
post-morphogenesis modification at each enrichment round. 
This approach is particularly appropriate with mini- 
35 protein IPBDs because we envision chemical synthesis of 
these SBDs. 
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EPITOPIC PEPTIDES 

The present invention also relates to the 
identification of epitopic peptides which bind to a 
target which is the epitopic binding site of an antibody, 
5 lectin, enzyme, or other binding protein* In the case 
of an antibody, the epitopic peptide will be at least 
four amino acids and more preferably at least six or 
eight amino acids. Usually, it will be less than 20 
amino acids , but there is no fixed upper limit . In 

10 general, however, the epitopic peptide will be a "linear" 
or "sequential" epitope. Typically, in constructing a 
" ^ 1^ pejpfeidets / all or most of 

the amino acid positions df the potential epitope will be 
varied. However, it is desirable that among those amino 

15 acids allowed at a particular position, that there be a 
relatively equal representation, as further discussed 
below in the context of mutagenesis of protein domains • 

VARIEGATION STRATEGY - - MUTAGENESIS TO OBTAIN POTENTIAL 

20 BINDING DOMAINS (OR EPITOPES) WITH DESIRED DIVERSITY 

When the number of different amino acid sequences 
obtainable by mutation of the domain is large when 
compared to the number of different domains which are 
displayable in detectable amounts, the efficiency of the 

25 forced evolution is greatly enhanced by careful choice of 
which residues are to be varied* First, residues of a 
known protein which are likely to affect its binding 
activity ( e.g. . surface residues) and not likely to 
unduly degrade its stability are identified. Then all or 

30 some of the codons encoding these residues are varied 
simultaneously to produce a variegated population of DNA. 
The variegated population of DNA is used to express a 
variety of potential binding domains , whose ability to 
bind the target of interest may then be evaluated, 

35 The method of the present invention is thus further 

distinguished from other methods in the nature of the 
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t highly variegated population that is produced and from 

which novel binding proteins are selected. We force the 
displayed potential binding domain to sample the nearby 
"sequence space" of related amino-acid sequences in an 
5 efficient, organized manner. Four goals guide the 
various variegation plans used herein, pref erably: 1) a 
very large number ( e.g. 10 7 ) of variants is available, 2) 
a very high percentage of the possible variants actually 
appears in detectable amounts, 3) the frequency of 
, _ _ _ 10 appearance of the desired variants is relatively uniform, 

and 4); variation occurs only at a limited number of v "; 

^ amino-acid residues, mosjv-p^^ residues havin^^j^:^^^ 

side gxpups directed toward au. .common regrion : on : the v .^.^ij™-.; 

surf ace of "the potential binding domain. 
15 This is to be distinguished from the sirirpie use of 

indiscriminate mutagenic agents such as radiation and 
hydroxylamine to modify a gene, where there is no (or 
jV^U*- K ? ; veiy cfcilqu^l control over the site of mutation. Many oi ' 
/ the stations will af fee t r residues, that are npt_a pstft v L 

•.»**■ 20 the binding domain. Moreover, since at a msoiable" _J/" r _ r _/ 

level of ^mutagenesis , any. modified codon is likely to be 
characterized by a single base change r only a limited and 
biased range of possibilities will be explored. Equally 
remote is the use of site- specif ic mutagenesis techniques 
25 employing mutagenic oligonucleotides of nonrandomized 
sequence, since these techniques do not lend themselves 
to the production and testing of a large number of 
variants. While focused random mutagenesis techniques 
are known, the importance of controlling the distribution 
30 of variation has been largely overlooked. 

The term "variegated DNA" (vgDNA) refers to a 
mixture of DNA molecules of the same or similar length 
which, when aligned, vary at some codons so as to encode 
at each such codon a plurality of different amino acids, 
35 but which encode only a single amino acid at other codon 
positions. It is further understood that in variegated 
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DNA, the codons which are variable, and the range and 
frequency of occurrence of the different amino acids 
which a given variable codon encodes, are determined in 
advance by the synthesizer of the DNA, even though the 
5 synthetic method does not allow one to know, a priori, 
the sequence of any individual DNA. molecule in the 
mixture* The number of designated variable codons in the 
variegated DNA is preferably no more than 20 codons, and 
more preferably no more than 5-10 codons. The mix of 

10 amino acids encoded at each variable codon may differ 
from codon to codon. A population o£ display phage into 
which variegated DNA has been introduced is likewise^ saxd 
to be "variegated". 

When DNA encoding a portion of a known domain of a 

15 protein is variegated, the original domain is called the 
parent of the potential binding domains (PPBD) , and the 
multitude of mutant domains encoded as a result of the 
variega.tion are collectively called the "potential 
binding domains ™ (PBD) , as their ability to bind to the 

20 predetermined target is not then known. 

We now consider the manner in which we generate a 
diverse population of potential binding domains in order 
to facilitate selection of a PBD-bearing phage which 
binds with the requisite affinity to the target of 

25 choice. The potential binding domains are first designed 
at the amino acid level. Once we have identified which 
residues are to be mutagenized, and which mutations to 
allow at those positions, we may then design the 
variegated DNA which is to encode the various PBDs so as 

30 to assure that there is a reasonable probability that if 
a PBD has an affinity for the target, it will be 
detected. Of course, the number of independent 
trans formants obtained and the sensitivity of the 
affinity separation technology will impose limits on the 

35 extent of variegation possible within any single round of 
variegation . 
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There are many ways to generate diversity in a 
protein. At one extreme, we vary a few residues of the 
protein as much as possible ("Focused Mutagenesis"), 
e.g., we pick a set of five to seven residues and vary 
5 each through 13-20 possibilities. An alternative plan of 
mutagenesis ("Diffuse Mutagenesis") is to vary many more 
residues through a more limited set of choices (See 
VERS 8 6a and PAKU86) . The variegation pattern adopted may 
fall between these extremes. 

.10;^..,,,...:--.-« i |I!here---4s---ao- fixed limit on the number- of codons^ 
whichrrcan be mutated sijmltaneously. However, - it is 

. r i ^desirs^^ wh±.ch ? ^^sja.3*S: 
in a reaeor^l^ 

is in fact displayed by at least one phage. Preferably^ 

15 the probability that a mutein encoded by the vgDNA and 
composed of the least favored amino acids at each 
variegated position will -be displayed by at least one 
- independent trans f ormant rih the library-is at least ^ . SO v 
and more preferably at least 0.90. (Muteins composed of 

20 more favored amino- acids would o£ course be more likely 
to occur in the same library.) 

Preferably, the variegation is such as will cause a 
typical transf ormant population to display 10 6 -10 7 
different amino acid sequences by means of preferably not 

25 more than 10 -fold more (more preferably not more than 3- 
fold) different DNA sequences. 

For a mini-protein that lacks a helices and S 
strands, one will, in any given round of mutation, 
preferably variegate each of 4-6 non- cysteine codons so 

30 that they each encode at least eight of the 20 possible 
amino acids. The variegation at each codon could be 
customized to that position. Preferably, cysteine is not 
one of the potential substitutions, though it is not 
excluded. 

35 When the mini-protein is a metal finger protein, in 

a typical variegation strategy, the two Cys and two His 
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residues, and optionally also the aforementioned Phe/Tyx, 
Phe and Leu residues, are held invariant and a plurality 
(usually 5-10) of the other residues are varied. 

When the mini-protein is of the type featuring one 
5 or more of helices and fi strands, the set of potential 
amino acid modifications at any given position is picked 
to favor those which are less likely to disrupt the 
secondary structure at that position. Since the number 
of possibilities at each variable amino acid is more 
10 limited, the total" number of variable amino acids may be 
greater without altering the sampling efficiency of the 
..V----"--^ . ■'>-':.:-:.;■;■': ^ ; *7; ;: . ; ~ r^: . :: ..v.: ,:t : 7?;'": 

For the last -mentioned class of mini -proteins, as 
well as domains other than minoU proteins, preferably not 
15 more than 20 and more preferably 5-10 codons will be 
variegated. However, if diffuse mutagenesis is employed, 
the number of codons which are variegated can be higher. 

The decision as to which residues to modify is eased 
by knowledge of which residues lie on the surface of the 
20 domain and which are buried in the interior. 

We choose residues in the PPBD to vary through 
consideration of several factors, including: a) the 3D 
structure of the PPBD, b) sequences homologous to PPBD, 
and c) modeling of the PPBD and mutants of the PPBD. 
25 When the number of residues that could strongly influence 
binding without preventing the normal folding of the PPBD 
is greater than the number that should be varied 
simultaneously, the user should pick a subset of those 
residues to vary at one time . The user picks trial 
30 levels of variegation and calculate the abundances of 
various sequences. The list of varied residues and the 
level of variegation at each varied residue are adjusted 
until the composite variegation is commensurate with the 
sensitivity of the affinity separation and the number of 
35 independent transf ormants that can be made. 
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Having picked which residues to vary, we now decide 
the range of amino acids to allow at each variable 
residue. The total level of variegation is the product 
of the number of variants at each varied residue. Each 
5 varied residue can have a different scheme of 
variegation, producing 2 to 20 different possibilities. 
The set of amino acids which are potentially encoded by 
a given variegated codon are called its "substitution 
set". 

^ ^ 10 „ The conputer that controls a DNA synthesizer, such" 

^ - as the Mill igen 7500 , can be programmed to synthesize! any 
£■ j base of = an "Oligp of - nts by, 

taking some, nt substrates (fiL.-SU. nt phosphpramidites) from 
each of two or more reservoirs. Alternatively , nt 
15 substrates can be mixed in any ratios and placed in one 
of the extra reservoir for so called "dirty bottle n 
synthesis. Each codon could be programmed differently. 
^The "mix" of bases- at ; eaicli iiuclioticle position"' of the 
~ codon determines, the relative frequency of occurrence of 
20 the dif f er feht amino acids encoded by that codon. 

Simply variegated codons are those in which those 
nucleotide positions which are degenerate are obtained 
from a mixture of two or more bases mixed in equimolar 
proportions. These mixtures are described in this 
25 specification by means of the standardized "ambiguous 
nucleotide" code. In this code, for example, in the 
degenerate codon "SNT" , ■ S" denotes an equimolar mixture 
of bases G and C, "N", an equimolar mixture of all four 
bases, and "T", the single invariant base thymidine. 
30 Complexly variegated codons are those in which at 

least one of the three positions is filled by a base from 
an other than equimolar mixture of two of more bases . 

Either simply or complexly variegated codons may be 
used to achieve the desired substitution set. 
35 If we have no information indicating that a parti- 

cular amino acid or class of amino acid is appropriate, 
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we strive to substitute all amino acids with equal 
probability because representation of one mini-protein 
above the detectable level is wasteful. Equal amounts of 
all four nts at each position in a codon (XTON) yields the 
5 amino acid distribution in which each amino acid is 
present in proportion to the number of codons that code 
for it. This distribution has the disadvantage of giving 
two basic residues for every acidic residue. In 
addition, six times as much R, S, and L as W or M occur. 
10 If five codons are synthesized with this distribution, 
each of the 243 sequences encoding some combination of ~L> 
and S are 7776 -tiines more abuhdsuit than each Off 5 the^32^ 
^ s^ To have 

five Ws present at detectable levels, we must have each 
15 of the (L,R,S) sequences present in 7776 -fold excess. 

It is generally accepted that the sequence of amino 
acids in a protein or polypeptide determine the three- 
dimensional structure of th£ molecule, including the 
" possibility of ho definite s trlifeture . Among polypeptides 
20 of definite length and sequence, some have a defined 
tertiary structure and most do hot. 

Particular amino acid residues can influence the 
tertiary structure of a defined polypeptide in several 
ways, including by: 
25 a) affecting the flexibility of the polypeptide main 

chain, 

b) adding hydrophobic groups, 

c) adding charged groups, 

d) allowing hydrogen bonds, and 

30 e) forming cross-links, such as disulfides, chelation 

to metal ions, or bonding to prosthetic groups. 
Vjl nihility: 

GLY is the smallest amino acid, having two hydrogens 
attached to the C a . Because GLY has no C 0 , it confers the 
35 most flexibility on the main chain. Thus GLY occurs very 
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frequently in reverse turns, particularly in conjunction 
with PRO, ASP, ASN, SER, and THR. 

The amino acids ALA, SER, CYS, ASP, ASN, LEU, MET, 
PHE, TYR, TRP, ARG, HIS, GLU, GLN, and LYS have 
5 unbranched & carbons. Of these, the side groups of SER, 
ASP, and ASN frequently make hydrogen bonds to the main 
chain and so can take on main- chain conformations that 
are energetically unfavorable for the others. VAL, ILE, 
and THR have branched £ carbons which makes the extended 
- io main- chain conformation-more favorable^ -Thus - VAL- and ILE ■ 
T „. are most "often ■ seen in =S4shee^Sv Because the side group 

tt. - - : ..of . THRi can eas ily • i f orm~:hydrogen bonds to. tte pl^cta||iR 
- -it has, less -tendency , to;-exis% in_ a & sheet. -.v:^-,:: 
;; The main ,_- chain. ..of proline is particularly 
15 constrained by the cyclic side group. The $ angle is 
always close to -60°. Most prolines are found hear the 
surface of the protein. 
; charge! : .vyr.*± . • •:•■?,•.,,•■,•>•. ■>■■>.•... -.] ■ -.-.•<- V . 

LYSrand ARG carry, a single positive charge at any pH 
20 below -10 .4 or 12;0, respectively. Nevertheless,- -the 
methylene groups, four and three respectively, of these 
amino acids are capable of hydrophobic interactions. The 
guanidinium group of ARG is capable of donating five 
hydrogens simultaneously, while the amino group of LYS 
25 can donate only three. Furthermore, the. . geometries of 
these groups is quite different, so that these groups are 
often not interchangeable. 

ASP and GLU carry a single negative charge at any pH 
above -4.5 and 4.6, respectively. Because ASP has but 
30 one methylene group, few hydrophobic interactions are 
possible. The geometry of ASP lends itself to forming 
hydrogen bonds to main- chain nitrogens which is 
consistent with ASP being found very often in reverse 
turns and at the beginning of helices. GLU is more often 
35 found in a helices and particularly in the amino- terminal 
portion of these helices because the negative charge of 
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the side group has a stabilizing interaction with the 
helix dipole (NXCH88 , SALI88) . 

HIS has an ionization pK in the physiological range, 
viz, 6.2. This pK can be altered by the proximity of 
5 charged groups or of hydrogen donators or acceptors. HIS 
is capable of forming bonds to metal ions such as zinc, 
copper, and iron. 
Hydrogen bonds: 

Aside from the charged amino acids, SER, THR, ASN, 
10 GLN, TYR, and TRP can participate in hydrogen bonds. 
' ' CrQgs links: r ' :. r * : " ' * • ' ^ : y ^F- 

^ ' " '''"-The Sifbsi iiiik is th# disul'- 

~ — - fide bond ^boSfii^ thiols , especially cthe 

thiols of CYS residties ^ In a suitably oxidizing environ - 
15 ment, these bonds form spontaneously. These bonds can 
greatly stabilize a particular conformation of a protein 
or mini -protein . When a mixture of oxidized and reduced 
thiol reagents ar6 present, exchange reactions take place 
- thdt r allow^the most stable conformation to predominate . 
20 ' Concerning disulfides in proteins and peptides, see also 
KATZ90, MATS89, PERR84, PERR86, SAUE86 , WELL86, JANA89 ■, 
HORV89, KISH85, and SCHN86 . 

Other cross links that form without need of specific 
enzymes include: 
25 1) (CYS) 4 :Fe Rubredoxin (in CREI84, P. 376) 

2 ) ( CYS ) 4 : Zn Aspartate Trans carbamylase ( in 

CREI84 , P . 376) and Zn- fingers 
(HARD90) 

3 ) ( HI S ) a ( M ET ) (CYS) : C u 
30 Azurin (in CREI84, P. 376) and Basic "Blue 0 Cu 

Cucumber protein (GTJSS88) 

4) (HIS) 4 :Cu CuZn superoxide dismutase 

5) (CYS) 4 : (Pe 4 S 4 ) Ferredoxin (in CREI84, P. 376) 

6) (CYS) 2 (HIS) 2 :Zn Zinc-fingers (6IBS88) 

35 7) (CYS) 3 (HIS) :Zn Zinc-fingers (GAUS87, GIBS88) 
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Cross links having (HIS) 3 (MET) (CYS) :Cu has the potential 
advantage that HIS and MET can not form other cross links 
without Cu. 

Simply Variegated Codono 

5 The following simply variegated codons are useful 

because they encode a relatively balanced set of amino 
acids: 

1) SNT which encodes the set [L,P,H/R,V,A,D,G] : a) one 
acidic (D) and one basic (R) , b) both aliphatic 

10 (L,V) and aromatic hydrophobics (H) , c) large 

(L,R,H) and small (G,A) side groups, d) ridged (P) 

_ • encoded once.. • ^^.-^ .,. y^^ r ^ - : :-t-; - 

2 ) RNG which encodes the set [M^T # K, R, V, A # B,G] a) one 
15 acidic and two basic (not optimal, but acceptable) ,. 

b) hydrophilics and hydrophobics, c) each amino acid 

v . . T . encoded once._ • _ ~ ^ ^ 

I 3) RMG which encodes the. set: [T,K, A,E] : a) one acidic, 

_ , : '* one basic, one, neutral hydrophilic, b). three favor, 
20 cx helices^, c) each amino acid encoded o^ 

, 4) VNT_ which encodes the set [L,P, H,R,I,T,N f S,y,A,p,G] : 
a) one acidic, one basic, b) all classes: charged, 
neutral hydrophilic, hydrophobic, ridged and flex- 
ible, Qtc. , c) each amino acid encoded once, 
25 5) RRS which encodes the set [N,S,K,R,D,E,G*] : a) two 

acidics, two basics, b) two neutral hydrophilics, c) 
only glycine encoded twice • 
6) N N T which encodes the set 
[F,S,Y,C,L,P,H,R,I,T,N,V,A,D,G] : a) sixteen DNA 
30 sequences provide fifteen different amino acids; 

only serine is repeated, all others are present in 
equal amounts (This allows very efficient sampling 
of the library.), b) there are equal numbers of 
acidic and basic amino acids (D and R, once each) , 
35 C ) all major classes of amino acids are present: 
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acidic, basic, aliphatic hydrophobic, aromatic 
hydrophobic , and neutral hydrophilic. 
7) NNG , which encodes the set 
[L*,R 2 ,S,W,P,Q,M,T,K,V,A,E,G, stop] : a) fair 
5 preponderance of residues that favor formation of a- 

helices [L,M,A,Q,K,E; and, to a lesser extent, 
S,R,T] ; b) encodes 13 different amino acids. (VHG 
encodes a subset of the set encoded by NNG which 
encodes^ 9 amino acids in nine different DNA 
10 sequences, with equal acids and bases, and 5/9 being 

a helix- f avoring. ) 

For the initial Variegation, NNT is preferred, iri 
most cases. However, when the codon is encoding an amino 
acid to be incorporated into an ci helix, NNG is 
15 preferred. 

Below, we analyze several simple variegations as to 
> ; the efficiency with which the libraries can be sampled. 

Libraries of random hexapep t ides encoded by (NNK) 6 
have been reported (SCOT90, CWIR90) . Table 130 shows the ~ 

20 expected behavior of such libraries. NNK produces single 
codons for PHE, TYR, CYS, TRP, HIS, GLN, TLE, MET, ASN, 
LYS, ASP, and GLU (of set) ; two codons -for each of VAL, 
ALA, PRO, THR, and GLY (* set) ; and three codons for each 
of LEU, ARG, and SER (Q set) . We have separated the 

25 64,000,000 possible sequences into 28 classes, shown in 
Table 13 OA, based on the number of amino acids from each 
of these sets. The largest class is 4>Qaraaa with ~14.6% 
of the possible sequences. Aside from any selection, all 
the sequences in one class have the same probability of 

30 being produced. Table 13 0B shows the probability that a 
given DNA sequence taken from the (NNK) 6 library will 
encode a hexapeptide belonging to one of the defined 
classes; note that only -6.3% of DNA sequences belong to 
the $Qactaa class. 
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Table 13 OC shows the expected numbers of sequences 
in each class for libraries containing various numbers of 
independent transf onnants ( viz. 10 6 . 3 -10*, 10 7 , 3*10 7 , 10 8 , 
3*10 8 , 10 9 , and 3-10 9 ). At 10 6 independent transf onnants 
5 (ITs) , we expect to Bee 56% of the QQQQQQ class, but only 
0,1% of the aaaaaa class. The vast majority of sequences 
seen come from classes for which less than 10% of the 
class is sampled. Suppose a peptide from, for example, 
class <M>QQaa is isolated by fractionating the library for 

r 10 binding to a target. Consider how much we know about; 

peptides that are relat.ed to the isolated sequence^ 

: „-,. , ^ ^ Because ; f pn^ : ; 4% :: tl^ ^Wp^a^clas s was san^led, .. .ot^cw-- ; 

not conclude that the amino acids from the Q set axf in , 

_^ fact the best from the Q ^ might ^e LEU : at; 

IS position 2, but ARG or SER could be better. Even if we 
isolate a peptide of the QQQQQQ class, there is a notice- 
able chance that better members of the class were not 
present in : the- library 

With a I ibrary of 10 7 "ITs; we see that several- 
20 classes have beenlcompletely sampled, but that the aactotota 
class is only 1.1% sampled. At 7 .6 -10 7 ITs , we expect 
display of 50% of all amino-acid sequences, but the 
classes containing three or more amino acids of the a set 
are still poorly sampled. To achieve complete sampling 
25 of the (NNK) 6 library requires about 3-10 9 ITs, 10-fold 
larger than the largest (NNK) 6 library so far reported. 

Table 131 shows expectations for a library encoded 
by (NNT) 4 (NNG) 2 . The expectations of ab un dance are 
independent of the order of the codons or of interspersed 
30 unvaried codons. This library encodes 0.133 times as 
many amino-acid sequences, but there are only 0.0165 
times as many DNA sequences. Thus 5.0-10 7 ITs (AtSt 60- 
fold fewer than required for (NNK) 6 ) gives almost complete 
sampling of the library. The results would be slightly 
35 better for (NNT) 6 and slightly, but not much, worse for 
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(NNG) 6 . The controlling factor is the ratio of DNA 
sequences to amino -acid sequences. 

Table 132 shows the ratio of #DNA sequences /#AA 
sequences for codons NNK, NNT, and NNG. For NNK and NNG, 
5 we have assumed that the PBD is displayed as part of an 
essential gene, such as gene III in Ff phage, as is 
indicated by the phrase "assuming stops vanish" * It is 
not in any way required that such an essential gene be 

used. If a non-essential gene is used, the analysis 

10 . would be slightly different; sampling of NNK and NNG 
would be slightly less efficient. Note that (NNT) 6 gives 

_ r 3V6-fold more amiho-acid sequences than (NNK) 5 but 
requires l . 7-fold fewer DNA sequences . Note also that 
(NNT) 7 gives twice as many ainino-acid sequences as (NNK) 6 , 

15 but 3. 3- fold fewer DNA sequences. 

Thus, while it is possible to use a simple mixture 
(NNS , NNK or NNN) to obtain at a particular positidn all ^ 
twenty amino acids , these simple mixtures lead to a 
highly biased set of encoded amino acids. This problem 

20 can be overcome by use of complexly variegated codons. 

We first will present the mixture calculated (see 
W090/02809) to minimize the ratio of most favored amino 
acid to least favored amino acid when the nt distribution 
is subject to two constraints: equal abundances of 

25 acidic and basic amino acids and the least possible 
number of stop codons. We have simplified the search for 
an optimal nt distribution by limiting the third base to 
T or G (C or G is equivalent) . However, it should be 
noted that the present invention embraces use of 

30 complexly variegated codons in which the third base is 
not limited to T or G (or to C or G) . 

The optimum distribution (the n fxS" codon) is shown 
in Table 10A and yields DNA molecules encoding each type 
amino acid with the abundances shown. Note that this 

35 chemistry encodes all twenty amino acids, with acidic and 
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basic amino acids being equiprobable, and the most 
favored amino acid (serine) is encoded only 2.454 times 
as often as the least favored amino acid (tryptophan) ♦ 
The B fxS n vg codon improves sampling most for peptides 
5 containing several of the amino acids [F,Y,C,W,H- 
,Q,I,M,N,K,D,E] for which NNK or NNS provide only one 
codon. Its sampling advantages are most pronounced when 
the library is relatively small. 

The results of searhing only for the complexly 
10 — variegated^ codon which minimizes the ratio ^matt 
f avored least f avored amino aeidp without additional' 

small, indicating that. insisting on equality of acids and 
bases and ri^nJjaizing stop codons costs us little. - Also 
15 note that, without restraining the optimization, the 
prevalence of acidic and basic amino acids comes out 
fairly close. On the other hand, relaxing the 
restriction leaves a distribution in which the r least 
favored amino acid ,ie [ only .412 times as prevalent as 
20 SER. 

The advantages of an NNT codon are discussed else- 
where in the present application. Unoptimized NNT 
provides 15 amino acids encoded by only 16 DNA sequences. 
It is possible to improve on NNT with the complexly 
25 variegated codon shown in Table 10C. This gives five 
amino acids (SER, LEU, HIS, VAL, ASP) in very nearly 
equal amounts. A further eight amino acids (PHE, TYR, 
ILE, ASN, PRO, ALA, ARG, GLY) are present at 78% the 
abundance of SER. THR and CYS remain at half the abun- 
30 dance of SER. When variegating DNA for disulf ide-bonded 
mini -proteins, it is often desirable to reduce the 
prevalence of CYS. This distribution allows 13 amino 
acids to be seen at high level and gives no stops; the 
optimized fxS distribution allows only 11 amino acids at 
35 high prevalence. 
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The NNG codon cam also be optimized. When equimolar 
T, C,A,G are used in NNG, one obtains double doses of LEU 
and ARG. Table 10D shows an approximately optimized NNG 
codon. There are, under this variegation, four equally 
5 most favored amino acids: LEU, ARG, ALA, and GLU. Note 
that there is one acidic and one basic amino acid in this 
set. There are two equally least favored amino acids: 
TRP and MET. The ratio of lfaa/mfaa is 0.5258. If this 
codon is repeated six times, peptides composed entirely 
10 of TRP and MET are 2% as common "as peptides composed 
entirely of the most favored amino acids. We: refer to 
— - i* 1 optithized^pS^-- 

When synthesizing vgDNA by the "dirty bottle" 

15 method, it is sometimes desirable to use only a limited 
number of mixes. One very useful mixture is called the 
"optimized mixtxire" in which we average the first tw 
positions of the f xS mixture : ^ 0 . 24 , C, « 0.17, Aj « 
0.33-, G x - 0.26, the second position is identical to the 

20 first, C3 « G 3 *= 0.5. This distribution provides the 
amino acids ARG, SER, LEU, GLY, VAL, THR, ASN, and LYS at 
greater than 5% plus ALA, ASP, GLU, ILE, MET, and TYR at 
greater than 4%. 

An additional complexly variegated codon is of 

25 interest. This codon is identical to the optimized NNT 
codon at the first two positions and has T:G::90:10 at 
the third position. This codon provides thirteen amino 
acids (ALA, ILE, ARG, SER, ASP, LEU, VAL, PHE, ASN, GLY, 
PRO, TYR, and HIS) at more than 5.5%. THR at 4.3% and 

30 CYS at 3-9% are more common than the LFAAs of NNK 
(3.125%) . The remaining five amino acids are present at 
less than 1%. This codon has the feature that all amino 
acids are present; sequences having more than two of the 
low- abundance amino acids are rare. When we isolate an 

35 SBD using this codon, we can be reasonably sure that the 
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first 13 amino acids were tested at each position. A 
similar cod on, based on optimized NNG, could be used. 

Several of the preferred simple or complex 
variegated codons encode a set of amino acids which 
5 includes cysteine. This means that some of the encoded 
binding domains will feature one or more cysteines in 
addition to the invariant disulfide -bonded cysteines. 
For example, at each NNT- encoded position, there is a one 
in sixteen chance of obtaining cysteine. If six codons 
10„ are so- rV^im^M^^X^-^^^^n^ of t domains - containing 
. additional cysteine^ of. cysteines 

_ca^ 

On the other hand, many disulfide- containing proteins 
contain . cysteines tha.jt do not- form disulfides, e.g. 
15 trypsin* The possibility of unpaired cysteines can be 
dealt with in several ways : 

First, the variegated phage population can be passed 
over an: -..^xn^bll'ized reagent that strongly binds free 
thiols, such as Sulf qLink (catalogue number 44895 H f rom 
20 Pierce Chemical Company, Rock ford, Illinois, 61105) . 
Another product from Pierce is TNB- Thiol Agarose (Cata- 
logue Code 20409 H) . BioRad sells Af f i-Gel 401 
(catalogue 153-4599) for this purpose. 

Second, one can use a variegation that excludes 
25 cysteines, such as: 

NHT that gives [F,S,Y,L r P,H,I,T,N,V,A,D] , 
VNS that gives 

[L*,P a ,H,Q,R 3 ,I,M,T*,N,K,S,V» # Ar # B,D,G»J, 
NNG that gives [L* ,S,W,P,Q,R a ,M,T,K,R,V,A,E,G, stop] , 
30 SNT that gives [L, P,H,R, V, A,D,G] , 

RNG that gives [M, T,K,R, V, A, E,G] , 
RMG that gives [T,K,A,E] , 

VNT that gives [L,P,H,R, I,T,N,S,V,A,D,G] , or 
RRS that gives [N,S,K,R,D,E,G«] . 
35 However, each of these schemes has one or more of the 
disadvantages, relative to NNT: a) fewer amino acids are 
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allowed, b) amino acids are not evenly provided, c) 
acidic and basic amino acids are not equally likely) , or 
d) stop codons occur. Nonetheless, NNG, NHT, and VNT are 
almost as useful as NNT. NNG encodes 13 different amino 
5 acids and one stop signal. Only two amino acids appear 
twice in the 16-fold mix. 

Thirdly, one can enrich the population for binding 
to the preselected target, and evaluate selected 
sequences post hoc for extra cysteines . Those that 
10 contain more cysteines than the cysteines provided for 
conformational constraint may be perfectly usable. It is 
.ipossibl^t^ the d^sj^gne^/ 

one will occur. ... This does riot mean that the binding 
^ domain def ined by the isolated DNA sequence is in any way 
15 unsuitable. The suitability of the isolated domains is 
best determined by chemical and biochemical evaluation of 
chemically, s 

Lastly; one bah block free thiols with reagents , 
such as Eiiman 1 s reagent , iodoacetate, or methyl iodide, 

20 that specif ically bind free thiols and that do not react 
with disulfides, and then leave the modified phage in the 
population. It is to be understood that the blocking 
agent may alter the binding properties of the mini - 
protein; thus, one might use a variety of blocking 

25 reagent in expectation that different binding domains 
will be found. The variegated population of thiol - 
blocked display phage are fractionated for binding. If 
the DNA sequence of the isolated binding mini -protein 
contains an odd number of cysteines, then synthetic means 

30 are used to prepare mini -proteins having each possible 
linkage and in which the odd thiol is appropriately 
blocked. Nishiuchi (NISH82, NISH86, and works cited 
therein) disclose methods of synthesizing peptides that 
contain a plurality of cysteines so that each thiol is 

35 protected with a different type of blocking group. These 
groups can be selectively removed so that the disulfide 
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pairing can be controlled. We envision using such a 
scheme with the alteration that one thiol either remains 
blocked, or is unblocked and then reblocked with a 
different reagent. 
5 Use of NNT or NNG variegated codons leads to very 

efficient sampling of variegated libraries because the 
ratio of (different amino-acid sequences) / (different DNA 
sequences) is much closer to unity than it is for NNK or 
even the optimized vg codon (fxS) . Nevertheless, a few 

10 ^ ag^np. acids v are„ pndLtted in each .case Both NNT and NNG _ 
allow members of^il ^ impprtant_ glasses of amino acids ^ 

^ aq^ic ^ ^'pa§±a,. _ Muti^^^ 

iiydTopiiillc] small," and ^ large. After selecting a binding 
domain, a subsequent variegation and selection may _be 

15 desirable to achieve a higher affinity or specificity. 
During this second variegation, amino acid possibilities 
overlooked by the preceding variegation may be 

In the second round of va-riegation, a porefer^df : " ^ 

20 _ strategy _is_tb vary each position through a new. set of „ ... 
residues which includes the amino acid(s) whichwere 
found at that position in. the successful binding domains, 
and which include as many as possible of the residues 
which were excluded in the first round of variegation. 

25 Thus, later rounds of variegation test both amino 

acid positions not previously mutated, and amino acid 
substitutions at a previously mutated position which were 
not within the previous substitution set. 

If the first round of vaxiegation is entirely 

30 unsuccessful, a different pattern of variegation should 
be used. For example, if more than one interaction set 
can be defined within a domain, the residues varied in 
the next round of variegation should be from a different 
set than that probed in the initial variegation. If 

35 repeated failures are encountered, one may switch to a 
different IPBD. 
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AFFINITY SELECTION OF TARGET -BINDING MUTANTS 

Affinity separation is used initially in the present 
invention to verify that the display system is working, 
i.e. , that a chimeric outer surface protein has been 
5 expressed and transported to the surf ace of the phage and 
is oriented so that the inserted binding domain is 
accessible to target material . When used for this 
purpose, the binding domain is a known binding domain for 
a particiLilar target and that target is the affinity 

10 molecule us^d^in the affinity separation process. For 
example, a display system may be validated by using 
inserting DNA encoding BPTI into a gene encoding ^ outer 
surface protein of the phage of interest, and testing for 
binding to anhydrotrypsih, which is normally bound by 

15 BPTI. 

If the phage bind to the target , then we have 
. confirmation that the corresponding binding^ domain ^ ii? 
indeed displayed by the phage. Phage which display the 
binding domain (and thereby bind the target) are 
20 separated from those which do not. 

Once the display system is validated, it is possible 
to use a variegated population of phage which display a 
variety of different potential binding domains, and use 
affinity separation technology to determine how well they 
25 bind to one or more targets. This target need not be one 
bound by a known binding domain which is parental to the 
displayed binding domains, i.e. . one may select for 
binding to a new target. 

For example, one may variegate a BPTI binding domain 
30 and test for binding, not to trypsin, but to another 
serine protease, such as human neutrophil elastase or 
cat heps in 6, or even to a wholly unrelated target, such 
as horse heart myoglobin. 

The term "affinity separation means" includes, but 
35 is not limited to: a) affinity column chromatography, b) 
batch elution from an affinity matrix material, c) batch 
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elution from an affinity material attached to a plate, d) 
fluorescence activated cell sorting, and e) electrophor- 
esis in the presence of target material. "Affinity 
material" is used to mean a material with affinity for 
5 the material to be purified, called the "analyte". In 
most cases, the association of the affinity material and 
the analyte is reversible so that the analyte can be 
freed from the affinity material once the impurities are 
washed away. 

-^"■-10■■^ t - ^ ■ i ■■ : "^4 a ■ : I£-■a■f'f±n■i~ty chromatography is to be used, then: 

* lp the -molecules of the - target material, must -be of 
•> ^. ^ ; - ^ sufficient . : s ize ' and;- 

applied to a_ solid support suitable for affinity; 
separation, * : "-- - .--^ / -.- . - - ; ; \ ."-'.r-;; - ■ _ 

15 2) after application to a matrix, the target material 

preferably does not react with water, 
3) after application to a matrix, the target material 
-~K^ v'7 - preferably does not bind or degrade proteins-in a- 

non-specific -way,- and ..... 
20 4) the Molecules -of -the target material must be -suf f ix 

ciently large that attaching the material to a 
matrix allows enough unaltered surface area (gener- 
ally at least 500 A 1 , excluding the atom that is 
connected to the linker) for protein binding. 
25 Affinity chromatography is the preferred separation 

means, but PACS, electrophoresis, or other means may also 
be used. 

The present invention makes use of affinity separa- 
tion of phage to enrich a population for those phage 
30 carrying genes that code for proteins with desirable 
binding properties. 

The present invention may be used to select for 
binding domains which bind to one or more target mater- 
ials, and/or fail to bind to one or more target 
35 materials. Specificity, of course, is the ability of a 
binding molecule to bind strongly to a limited set of 
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target materials, while binding more weakly or not at all 
to another set of target materials from which the first 
set must be distinguished. 

Almost any molecule that is suitable for affinity 
5 separation may be used as a target. Possible targets 
include, but are not limited to peptides, soluble and 
insoluble proteins, nucleic acids, lipids, carbohydrates, 
other organic molecules (monomeric or polymeric) , 
inorganic compounds, and organometallic compounds. 
10 Serine proteases are an especially interesting class of 

potential target materials, 
•f % ^ ^ For- chromatpgxa^i^^ REk^^prS el@tit;rQ^ 

; may be a need to cpyalently link the target material to 
a second chemical entity^ Fon chromatography the second 
15 entity is a matrix, for FACS the second entity is a 
fluorescent dye, and for electrophoresis the second 
entity is a strongly charged molecule. „ in many cases, ^o 
coupling is required because the target material already 
has the desired property of: a) immobility, b) fluores- 
20 cence, or c). charge. .. In other cases , chemical or 
physical coupling is required. % 

It is not necessary that the actual target material 
be used in preparing the immobilized or labeled analogue 
that is to be used in affinity separation; rather, 
25 suitable reactive analogues of the target material # may be 
more convenient. Target materials that do not have 
reactive functional groups may be immobilized by first 
creating a reactive functional group through the use of 
some powerful reagent, such as a halogen. In some cases, 
30 the reactive groups of the actual target material may 
occupy a part on the target molecule that is to be left 
undisturbed. In that case, additional functional groups 
may be introduced by synthetic chemistry. 

Two very general, methods of immobilization are 
35 widely used. The first is to biotinylate the compound of 
interest and then bind the biotinylated derivative to 
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immobilized avidin. The second method is to generate 
antibodies to the target material, immobilize the anti- 
bodies by any of numerous methods, and then bind the 
target material to the immobilized antibodies. Use of 
5 antibodies is more appropriate for larger target materi- 
als; small targets (those comprising, for example, ten or 
fewer non-hydrogen atoms) may be so completely engulfed 
by an antibody that very little of the target is exposed 
in the target -antibody complex. 
10 Non-e oval en t immobilization of hydrophobic molecules 

^ r without resortr^to antibodies: may also be used. A -cckii-^ 
; -. v .- v - pound,: such as >i2 A3 , 3 - 1 r imethyldecane is ■:; blended with^^ 
„ . matrix precursor, such- as sodium alginate, and the, 
mixture is extruded into a hardening solution. The 
15 resulting beads will have 2/3 , 3-trimethyldecane dispersed 
throughout and exposed on the surface . 

Other immobilization methods depend on the presence 1 
_ . ^ r - ,.: of . particular - chemical f xinct ionalities. A polypeptide - 
will present -NHj (N- terminal; Lysines) , -COOH (C-ter- 
20 minal ; "Asparti:c Acids ; Glutamic Acids) , -OH (Serines ; 
Threonines; Tyrosines), and -SH (Cysteines). A polysac- 
charide has free -OH groups, as does DNA, which has a 
sugar backbone . 

The following table is a nonexhaustive review of 
25 reactive functional groups and potential immobilization 
reagents: 

Reagent 

Derivatives of 2,4, 6-trinitro benzene 
sulfonates (TNBS) , (CREI84, p-11) 

Carboxylic acid anhydrides , g f g t 
derivatives of succinic anhydride, 
maleic anhydride, citraconic anhydride 
(CREX84, p. 11) 

Aldehydes that form reducible Schiff 
bases (CREI84, p. 12) guanido 
cyclohexanedione derivatives ( CREI84 , 
p. 14) 



Group 



30 



35 



40 



R-NHa 
R-NHj 

R-NHj 
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Diazo cmpds (CRBI84, p. 10) 

Epoxides (CREI84, p. 10) 

Carboxylic acid anhydrides 

Carboxylic acid anhydrides 

Benzyl halide and sulfenyl halides 
(CREI84, p. 19} 

N-alkylmaleimides (CREI84, p. 21) 

- ethyleneimine derivatives (CREI84, 
p. 21) 

^^^^^|!«^cj^4po|^.oujnds ,. • (CREI84 , 

Diisulfide reagents, (CREI84, p.23) 

Alkyl iodides, (CREI84, p. 20) ketones 
Make Schiff »s base and reduce with 
NaBH,. (CREI84, p. 12-13) 

Oxidize to GOOH. vide supra . • ; . 

Convert tb^ R-S0 2 C1 and react with 
"innnobilized alcohol or amine. r 

Convert to* R-p6 2 C1 and react with 
immobilized alcohol or amine. ; " " 

CC double bonds Add HBr and then make amine or thiol. 

35 

The extensive literature on affinity chromatography 
and related techniques will provide further examples . 

Matrices suitable for use as support materials 
include polystyrene, glass, agarose and other chromato- 

40 graphic supports, and may be fabricated into beads, 
sheets, columns, wells, and other forms as desired. 
Suppliers of support material for affinity chromatography 
include: Applied Protein Technologies Cambridge, MA; 
Bio-Rad Laboratories, Rockville Center, NY; Pierce 

45 Chemical Company, Rockford, IL. Target materials are 
attached to the matrix in accord with the directions of 
the manufacturer of each matrix preparation with 
consideration of good presentation of the target. 

_9215679A1J_> 



R-CQjH 
R-CCV 
5 R-OH 

Aryl-OH 
indole ring 

1° 

R-SH 
R-SH 

15 

; l: R-SH 
20 ; ' , . 
Thiol ethers 

25 Aldehydes : , 

R-S0 3 H . ':' ^1 

30 R-^PQjH 
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Early in the selection process, relatively high 
concentrations of target materials may be applied to the 
matrix to facilitate binding; target concentrations may 
subsequently be reduced to select for higher affinity 
5 SBDs. 

The population of display phage is applied to an 
affinity matrix under conditions compatible with the 
intended use of the binding protein and the population is 
fractionated by passage of a gradient of some solute over 
1.0 ;, the column . . -The ^rpcess ^^iGhes^- for PBDs having 
.... „ r ... affinity ^f or^h^ r ^arg^ 7 -and^for L ^ich^the--af f inity f pr 
. _ -the target. JLs ; l^east-. affected, by,~ the eluants -used. : ^The 
enriched ^fractioM viable display 

phage that elute from the cplpm at greater concentration 
15 of the eluant. 

The eluants preferably are capable of weakening 
noncovalent interactions between the displayed PBDs and 
the immobilized^ 

do not kill '..the., phage; the genetic message corresponding 
, 20 to successful mini .= proteins^ is most conveniently- 
amplified by reproducing the phage rather than by in 
vitro procedures such as PCR. The list of potential 
eluants includes salts (including Na+, Rb+, S0 4 -- r 

H2P0 4 - # citrate, K+, Li+, Cs+, HS0 4 -, C0 3 --, Ca++, Sr++, 

25 C1-, P0 4 , HCO3-, Mg++, Ba++, Br- , HP0 4 -- and acetate) 

acid, heat, compounds known to bind the target, and 
soluble target material (or analogues thereof) . 

Neutral sdlutes, such as ethanol, acetone, ether, or 
urea, are frequently used in protein purification and are 

30 known to weaken non-covalent interactions between 
proteins and other molecules. Many of these species are, 
however, very harmful to bacteria and bacteriophage. 
Urea is known not to harm M13 up to 8 M. Salt is a 
preferred solute for gradient formation in most cases. 

35 Decreasing pH is also a highly preferred eluant. In some 
cases, the preferred matrix is not stable to low pH so 
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that salt and urea are the most preferred reagents. 
Other solutes that generally weaken non-covaient interac- 
tion between proteins and the target material of interest 
may also be used. 
5 The uneluted display phage contain DNA .encoding 

binding domains which have a sufficiently high affinity 
for the target material to resist the elution conditions. 
The DNA encoding such successful binding domains may be 
recovered in a variety of ways. Preferably, the bound 

10 display piiage are sii^ly eluted by means of a change in 
the elution conditions. Alternatively, one may culture 
the .jasi^ theS- targets cbritaining- 

-rv iratrix ^with pheiibl lor - other > suitable solvent); -ani: 
amplify the DNA by PGR or by recombinant: DNA techniques ^ 

15 Or, if a site for a specific protease has been engineered 
into the display vector, the specific protease is used to 
cleave, the binding domain from the GP. ,-_- r y :i 

Variation in the support material (polystyrene, 
glass, agarose, cellulose, etc ) in analysis of clones 

20 carrying SBDs is used to distinguish phage that bind to 
the support material rather than the target. 

The harvested phage are now enriched for the 
binding- to- target phenotype by use of affinity separation 
involving the target material immobilized on an affinity 

25 matrix. Phage that fail to bind to the target material 
are washed away. It may be desirable to include a 
bacteriocidal agent, such as azide, in the buffer to 
prevent bacterial growth. The 'buffers used in 
chromatography include: a) any ions or other solutes 

30 needed to stabilize the target, and b) any ions or other 
solutes needed to stabilize the PBDs derived from the 
IPBD. 

Recovery of phage that display binding to an 
affinity column is typically achieved by collecting 
35 fractions eluted from the column with a gradient of a 
chaotropic agent as described above, or of the target 
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material in soluble form; fractions eluting later in the 
gradient are enriched for high-affinity phage. The 
eluted phage are then amplified in suitable host cells. 
If some high- affinity phage cannot be eluted from 
5 the target in viable form, one may: 

1) flood the matrix with a nutritive medium and grow 
the desired phage in fiifcji, 

2) remove parts of the matrix and use them to inoculate 
growth medium, 

lOu - L, 3) : chendcally;-or qnzywatically degrade the linkage 
holding^ the target to the, matrix so that GPs still 

! t .jJ1.4) "/.deg»!Eie the v phage auid rpsgyer DNA with ... p)*enol^_o^ 
other suitable solvent; the recovered DNA is used to. 
15 trans form cells that regenerate GPs. 

It is possible to utilize combinations of these methods. 
It should be remembered that what we want to recover from 
tte affinity natrix is not the phage E£E 6fi, ^ 
^ information in them as to the sequence of the successful; 
20 epitope or binding domain ^ 

^ described in w690/q2809, one may modify the 
affinity separation of the method described to select a 
molecule that binds to material A but not to material B, 
or that binds to both A and B at competing or 
25 nonconqpeting sites, or that do not bind to selected 
targets. 

SUBSEQUENT PRODUCTION 

Using the method of the present invention, we can 
obtain a replicable phage that displays a novel protein 

30 domain having high affinity and specificity for a target 
material of interest. Such a phage carries both amiho- 
acid embodiments of the binding protein domain and a DNA 
embodiment of the gene encoding the novel binding domain. 
The presence of the DNA facilitates expression of a 

35 protein comprising the novel binding protein domain 
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within a high-level expression system, which need not be 
the same system used during the developmental process. 

We can proceed to production of the novel binding 
protein in several ways, including: a) altering of the 
5 gene encoding the binding domain so that the binding 
domain is expressed as a soluble protein, not attached to 
a phage (either by deleting codons 5' of those encoding 
the binding domain or by inserting stop codons 3 1 of 
those encoding the binding domain), b). moving the DNA 

10 encoding the binding domain into a known expression 
system, and c) utilizing the -phage as a purification 

" ^ system, (If the" doinaxn is small enough ,~ it may be" 
feasible to prepare it b^-convehtional peptide synthesis 
methods*) ' • • " :. 

15 As previously mentioned, an advantage inhering from 

the use of a mini -protein as an 1PBD is that it is likely 
that the derived SBD will also behave like a mini -protein 
and will be obtainable by means of chemical synthesis. 
(The term "chemical synthesis", as used herein, includes 

20 the use of enzymatic agents in a cell -free environment.) 

Peptides may be chemically synthesized either in 
solution or on supports. Various combinations of 
stepwise synthesis and fragment condensation may be 
employed. 

25 During synthesis, the amino acid side chains are 

protected to prevent branching. Several different 
protective groups are useful for the protection of the 
thiol groups of cysteines: 

1) 4-methoxybenzyl (MBzl; Mob) {N1SH82; ZAFA88) , remov- 
30 able with HF; 

2) acetamidomethyl (Acm) (NISH82; NISH86; BECK89C) , 
removable with iodine; mercury ions ( e.g. , mercuric 
acetate) ; silver nitrate; and 

3 ) S -para -me thoxybenzyl . 
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Other thiol protective groups may be found in 
standard reference works such as Greene, PROTECTIVE 
GROUPS IN ORGANIC SYNTHESIS (1981) . 

Once the polypeptide chain has been synthesized, 
5 disulfide bonds must be formed. Possible oxidizing 
agents include air (NISH86) , ferricyanide (NISH82) , 
iodine (NISHB2) , and perf ormic acid. Temperature, pH, 
solvent, and chaotropic chemicals may affect the course 
of the oxidation. 
10 A large number of mini -proteins- ■witeh^a^pl"ura s 3bity ! -of'~- 

disulfide bonds have been chemically synthesized- in^r 
biologically active form.... .* 

i - The successful binding . domains , of r the present ,.= 
invent ion may* -alone or as part of a' larger protein , l>e 
15 used for any purpose for which binding proteins are 
suited, including isolation or detection of target 
materials. In furtherance of this purpose, the novel 
binding proteins may be coupled: directly^ or indirectly, 
covalently or ^noncovalently, ; to a label, carrier or 

20 support. ; " " 

When used as a pharmaceutical , the novel binding 
proteins may be contained with suitable carriers or 
adjuvanants. 

***** 

25 All references cited anywhere in this specification 

are incorporated by reference to the extent which they 
may be pertinent. 



WO 92/15679 



PCT/US92/01539 



64 

All cells used in the following examples are E. coli 
cells. 

EXAMPLE I 

DISPLAY OF BPTI AS A FUSION TO M13 GENE VIII PROTEIN t 
5 Example I involves display of BPTI on M13 as a 

fusion to the mature gene VIII coat protein. Each DNA 
construction was confirmed by restriction digestion and 
DNA sequencing. 

1. Co nstruction of the yj j. j. - 6 jgnal - 

10 sequence: rbpti: rmature-viii-coat-protein Display Vector. 

The operative cloning vectors are M13 and phagemids 
^p.: derive^;^ initial c^^tzuctios^ 

f 1 -based phageinid pGEM^3Zf ( -) ^ . (Pr omega Corp . # Jfedispn, 

WI.) . " * " - .-.-..^=,— . .. ' v - • 

15 We constructed a gene encoding, in order, : i) a 

modified lacUVS promoter, ii) a Shine -Dalgarno sequence, 
iii) M13 gene VIII signal sequence, iv) mature BPTI, y) 
mature -M13 -oene- VIII coat protein, vi) multiple stop 
codons, and vii) a transcription terminator. This gene 

20. is. illustrated in Table 102 . The operator of lacITV5 is 
the symmetrical lacO to allow tighter repression in the 
absence of IPTG. The longest segment that is identical 
to wild- type gene VIII is minimized so that genetic 
recombination with the co-existing gene VIII is unlikely. 

25 i) OCV based upon pGEM-3Zf . 

pGEM-SZf* 1 * 0 (Promega Corp., Madison, WI. ) is a vector 
containing the amp gene, bacterial origin of replication, 
bacteriophage fl origin of replication, a lacZ operon 
containing a multiple cloning site sequence, and the T7 

30 and SP6 polymerase binding sequences. 

BamH I and Sai l sites were introduced at the 
boundaries of the lacZ operon (to facilitate removal of 
the lacZ operon and its replacement with the synthetic 
gene); this vector is named pGEM-MB3/4. 

35 ii) OCV based upon M13mpl8. 
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Ml3mpl8 (YANI85) is a vector (New England Biolabs, 
Beverly, MA.) consisting of the whole of the phage genome 
plus a lacz operon containing a multiple cloning site 
(MESS77) . fiafflHI and Sail sites were introduced into 
5 M13mpl8 at the 5' and 3' ends of the lafiZ operon; this 
vector is named M13-MB1/2. 
B) Synthetic Gene. 

A synthetic gene f yTTi-siq™^ -Rpmienee ; : mature- 
bptii-. ^M'rp-VTii. mat-.TirQtei^ was constructed from 16 
10 synthetic -.. pld^onucleotides>»sy^thesi2ed by Genetic 
Designs Inc. , of Houston, ^exas> yjju a method simi^ar^gg 
^■^those^^ 

were phosphorylated, with the exception of { X#m^5J . most 
15 molecules, using standard methods, annealed and ligated 
in stages. The overhangs were filled in with T4 DNA 
polymerase and the DNA was cloned into the HincI^ site of 
pGEM- 3Zf ( - ) ,• the initial construct, is pGEM-MBl : Double-^ 
• stranded ; DNA ; of "p^EM-MBl was cut; with £at±, Zpil&i in 
"with -T4 "l)NA polymerase and ligated to a Sail linker (New 



20 



England BioLabs) so that the synthetic genesis bounded by 
EamHl and Sail sites (Table 102). The synthetic gene was. 
obtained on a B^mHI-S&lI cassette and cloned into pGEM- 
MB3/4 and M13-MB1/2 using the introduced E&mHI and Sail 
25 sites,, to generate pGEM-MB16 and M13-MB15, respectively. 
The synthetic insert was sequenced. The original 
Ribosome Binding Site (RBS) was in error (AGAGG instead 
of the designed AGGAGG) and we detected no expressed 
protein in 'vivo and la vitro. 
30 C) Alterations to the synthetic gene, 
i) Ribosome binding site (RBS). 

in pGEM-MB16, a Sa£l-Eb£l DNA fragment (containing 
the RBS) was replaced with an oligonucleotide encoding a 
new RBS very similar to the RBS of Z*. SSll tnat ±e 

35 known to function. 
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Original putative RBS (5»-to-3») 

GAGCTCagaggCTTACTATSAAGAAATCTCTGGTO 

iSSSXl I Nhel | 

5 New RBS (5'-tO-3 f ) 

GAGCTC'TqqaqqaAATAA AATGA AGAAATCT 

ISacXi 1 Nh 9 i 1 

10 The putative RBSs are lower case and the initiating 
methionine codon is underscored and bold. The resulting 
construct :.ig^GEM^MB2&.^ In vitro expression of -the gene 
carried by pGEM- MB2 0 produced a novel-protein species of 
the expected si .-^^•^•-r.. ■^■•-..■^■^ . 

15 ii) tac promoter. 

To obtain higher expression of the fusion protein^ 
the lacUVS promoter was changed to a tac promoter. In 
pGEM-MBIS, a BamHI- Hpa ll fragment (containing the lactrvs 
.promoter) was replaced with an oligonucleotide containing 

20 the -35 sequence of the trp prompter (££ - RUSS82 ) 
converting the lacUVS promoter to tac. The vector is 
named pGEM-MB22 . - . — ~— ^ . 



25 



30 



MB16 5 1 - GATCC tctagagtcggc TTTACA ctttatgcttc (cg- 

gctcg. . -3 r 

3 1 - G agatctcagccg aaatgt gaaatacgaag 

gc (cgagc. . -5 1 

J L t -351 J L 



BamH I 



Hna ll 



35 



40 



45 



MB22 , 5 1 - GATCC actccccatccccctg TTGACA attaatcat -3 f 
3'- G tgaggggtagggggac AACTGT taattagtagc-5 1 
J L 1 -351 J 

(Seaii) 

Promoter and RBS variants of the fusion protein gene 
were constructed as follows: 

Promoter RBS Encoded Protein. 



pGEM-MBl6 lac 

pGEM-MB20 lac 

pGEM-MB22 tac 

pGEM-MB26 tac 



old 
new 
old 
new 



VIIIs . p • -BPTI -matureVIII 



t i 
i i 
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The synthetic genes from pGEM-MB20 and pGEM-MB26 were 
recloned into the altered phage vector M13-MB1/2 to 
generate the phage M13-MB27 and M13-MB28 respectively, 
iii. Signal Peptide Sequence. 

5 In vitro egression of the synthetic gene regulated by 
tac and the "new" RBS produced a novel protein of the 
expected size for the unprocessed protein (-16 kd) . In 
vivo expression also produced novel protein of full size; 
no processed protein could be seen on phage or in cell 
10 ^ -extraet«= ^sM?rer staining or by -WeBtern analyBia-withx 
v .^.-anti-BPTI tantibody v h*~*-~-- "~ . :_r ", » ■•x~*r--&& 

-7 Table t -106 shows ; a number of :jypical signal sequences, 

Charged residues are generally -thought to be of great - 

15 importance and are shown bold and underscored' Each 
signal sequence contains a long stretch of uncharged 
residues that are mostly hydrophobic; these are shown^in,- 
•.. - lower ; case V Kt the rigiit, in parentheses , is the length 

— . : qf the stretch of uncharged-residues. We note that the,; 

-20 fusions of- gene ~3£in signal- to BPTI- and gene III signal 

to BPTI have rather short uncharged segments. These 
short uncharged segments may reduce or prevent processing 
of the fusion peptides. We know that the gene III signal 
sequence is capable of directing: a) insertion of the 

25 peptide comprising (mature-BPTI) : : (mature -gene -III- 
protein) into the lipid bilayer, and b) translocation of 
BPTI and most of the mature gene III protein across the 
lipid bilayer ( vide infra) . That the gene III remains 
anchored in the lipid bilayer until the phage is 
30 assembled is directed by the uncharged anchor region near 
the carboxy terminus of the mature gene III protein (see 
Table 116) and not by the secretion signal sequence. The 
phoA signal sequence can direct secretion of mature BPTI 
into the periplasm of R*_ fifili (MARK86) . Furthermore, 
35 there is controversy over the mechanism by which mature 
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authentic gene VTII protein comes to be in the lipid 
- bilayer prior to phage assembly. 

Thus we replaced the DNA coding for the gene -VIII- 
putative- signal -sequence by each of DNA. coding for: 1) 
5 the phoA signal sequence, 2) the bla signal sequence, and 
3) the M13 gene III signal . Each of these replacements 
produces a tripartite gene encoding a fusion protein that 
comprises, in order: (a) a signal peptide that directs 

--.^^^^^^S^^^^P^J^^ P-f^iP^ 8 ? 1 of parts (b) fu^^ Jcl 
10 r derived _ -from a first gene ; r (b)_ an initial potential . 
binding domain (BPTI in this case) derived from a second ^ 
gene (in this case, the second geiie iis an animal gene); , 
and (c) a structural packaging signal (the mature gene ^ 
VIII coat protein), derived from a third gene. 
15 The process by which the IPBD: : packaging -signal fusion 

arrives on the phage surface is illustrated in Figure 1. 

... - " * • - - - - 

■ In Figure la , we see that authentic gene VIII jprotein ^ 
appears (by whatever process) in the lipid bilayer so 
that both the amino and carboxy termini are in th^f^to^ : ^ 

20 plasm. Signal peptidase- I cleaves the gene VIII protein 
liberating the signal peptide (that is absorbed by the 
cell) and mature gene VIII coat protein that spans the 
lipid bilayer. Many copies of mature gene VTII coat 
protein accumulate in the lipid bilayer awaiting phage 

25 assembly (Figure lc) . Some signal sequences are able to 
direct the translocation of quite large proteins across 
the lipid bilayer. If additional codons are inserted 
after the codons that encode the cleavage site of the 
signal peptidase- I of such a potent signal sequence, the 

30 encoded amino acids will be translocated across the lipid 
bilayer as shown din Figure lb. After cleavage by signal 
peptidase- I, the amino acids encoded by the added codons 
will be in the periplasm but anchored to the lipid 
bilayer by the mature gene VTII coat protein, Figure Id. 

3 5 The circular single - stranded phage DNA is extruded 
through a part of the lipid bilayer containing a high 
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concentration of mature gene VIII coat protein; the 
carboxy terminus of each coat protein molecule packs near 
the DNA while the amino terminus packs on the outside. 
Because the fusion protein is identical to mature gene 
5 VIII coat protein within the trans -bilayer domain, the 
fusion protein will co- assemble with authentic mature 
gene VIII coat protein as shown in Figure le. 

In each case, the mature VIII coat protein moiety is 
intended to co-assemble with authentic mature VIII coat 

10 --protein r-to— produce t;phage- particle -having _ BPTI ^omains^ ^ 
displayed on the surf ace ^ The source and chaise ter^of^ ^ 
the secretion signal sequence L is r; ^9^ 4^^°^ 
the signal . sequence .".is -,~GUt away. . . and. degraded^^^;^ 
structural packaging signal, ..however, is quite l^o^t:: ;:, 

15 because it must co-assemble with the authentic coat 
protein to make a working virus sheath. , 
a) Bacterial Alkaline Phosphatase (b&q&) Signal Peptide. 

- ^ *' Construct pGEM-MB26 contains ; a Safil^^EfiHI : f^agni^^t^- 
containing the; r new RBS i and sequences encoding, the 

20 initiating methionine -and the signal peptide^ of M13 gene - 
VTII pro-protein. This fragment was replaced with a 
duplex (annealed from four oligo-nts) containing the RBS 
and DNA coding for the initiating methionine and signal 
peptide of PhoA (INOU82) ; phage is pGEM-MB42 ♦ M13MB48 is 

25 a derivative of GemMB42. A EamHI-fiall DNA fragment from 
GenMB42, containing the gene construct, was ligated into 
a similarly cleaved vector M13MB1/2 giving rise to 
M13MB48. 

PhoA RBS and signal peptide sequence 
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5 1 - GAGCTCCATGGGAGAAAATAAA • ATG . AAA . CAA . AGC • ACG . - 
[ SacI | met lys gin ser thr 

• ATC . GCA . CTC . TTA . CCG . TTA . CTG - TTT . ACC . CCT . GTG . ACA . - 
5 ile ala leu leu pro leu leu phe thr pro val thr 

.AAA.GCC.CGT.CCG.GAT. -3 " 
lys ala arg pro asp 

10 

b) ^-lactamase signal peptide. 

To allow transfer of the /?- lactamase ( amp ) promoter and 

4_ : DNA coding for the signal peptide into the (mature - 

BPTI) : i (mature-VXII-coat-protein) gene, we f irst 
15 introduced am Accia sit:e the amp gene adjacent to the 
codons for the fi - lactamase signal peptide cleavage site 
(C25or»T and Aasoi-KS) ; vector is pGEM-MB40. We then ligated 
a BamH I linker into the MtH site at nt 2260, 5' to the 
promoter; vector is pGEM-MB45 * The BamHI - Acc III fragment 
- y 20 how contains the amp promoter,: ;;SESR RBS , initiating 
methionine and /J-lactamase signal peptide. This fragment 
was used to replace the corresponding fragment from pGEM- 
MB26 to generate construct pGEM-MB46. 

25 amp gene promoter and signal peptide sequences 

5 1 - GGATCCGGTGGCACTTTTCGGGGAAATGTC 

TATTTTTCTAAATAOITTCAAATATC 

30 

CTGATAAATGCTTC^TAATATTGAAAAAGGAAGAGT 

ATG.AGT.ATT.CAA.C^T.TTC.CGT.GTC.GC^ - 
35 met ser ile gin his phe arg val ala leu ile pro phe 

phe 

GCG . GCA. TTT . TGC . CTT - CCT . GTT . TTT . GCT . CAT . CCG • -3 ' 

ala ala phe cys leu pro val phe ala his pro .... 

40 

c) M13-aene-III-siqnal s : bpti i :mature-VTII-coat-protein 
We may also construct Ml 3 - MBS 1 which would carry a Km* 

gene and a M13 - gene - III - s ianal - pept ide gene fragment 
fused to the previously described BPTI: :mature-VTII-coat- 
45 protein gene fragment. Because M13-MB51 contains no gene 
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ill , the phage can not form plaques, but can render cells 
Km*. Infectious phage particles can be obtained via 
helper phage. The gene III signal sequence is capable of 
directing (BPTI) :: (mature- gene- III -protein) to the 
surface of phage ivide infra) . 

Summary of signal peptide fusion protein variants. 

Signal Fusion 
Prompter RBS Sgquen.ee protein 



pGEM-MB26 £a£ new VIII BPTI/VIII-coat 

pGEM-MB42 ^ Jafi- -new - - - ? - ^ej^:-^^»PT5^/vri^eoat?^ 

pGEM-MB46 " fiffiB ' STOP- ' r ~ ^ amfi^ BPTI/VIII - coat 

pGEM-MB51 ^III ■ r.^^^. - BPTI/VHI-COat: 

.(hypoth ..).;.. ^e*- -•• --• - • ^=--- 

M-13 mraa: ... -tae - -new - ~~-;BhfiA-^ BPTI/VIII-coat - 



p., tnalvaie o f the Protein Products Encoded by the 
Synthetic (sign a l - neptide^: mature -bPt i ; tviii-CPat- 

' ' - : rr"^ lnl Genes- ■ " v r ^ : - : -' ■ ' >v " ,/ - ■ ' ' : " ^ 

20 i)~ la yifi^ analysis^ — - 

- A coupled transcription/trahslat-ion prokaryotic system^ 
(Amersham Corp., Arlington Heights, IL) was, utilized for 
the in vitro analysis of the protein products encoded by- 
the BPTI/VIII synthetic gene and the derivatives. 

25 Table 107 lists the protein products encoded by the 
listed vectors which are visualized by standard 
f luorography following in vitro synthesis in the presence 
of M S -methionine and separation of the products using SDS 
polyacrylamide gel electrophoresis. In each sample, a 

30 pre- 0- lactamase product (-31 kd) can be seen. This is 
derived from the amp gene which is the common selection 
gene for the vectors. In addition, a (pre-BPTI/VTII) 
product encoded by the synthetic gene and variants can be 
seen as indicated. The migration of these species (-14.5 

35 kd) is consistent with the expected size of the encoded 
proteins. 

ii) IB vivo analysis. 
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The vectors detailed in sections (B) and (C) were 
freshly transfected into the coli strain XLl-blue ^ 
(Stratagene, La Jolla, CA.) and in strain SEP'. Ex. ssUl 
strain SE6004 (LISS85) carries the prlA4 mutation and is 
5 more permissive in secretion than strains that carry the 
wild- type prlA . SE6004 is rand is deleted for lad; thus 
the cells can not be infected by M13 and lacUVS and £a£ 
promoters can not be regulated with IPTG. Strain SEP' is 
derived from strain SE6004 (LISS85) by crossing with XL1- 

10 Blue 0 ™?;. the F' in XLl-Blue 0 ^ carries Tc R and lasX«. SE6- 
004 is streptomycin 11 , Tc s while XLl-Blue"* 0 is strep to - 
mycin s > Tc* so that both parental 'strains can be killed 
:~ H^hTtb^ -SEP' 
retains the secretion-permissive phenotype of the 

15 parental strain, SE6004 (sxlM.) • 

The fresh transfectants were grown in NZYCM medium 
(SAMB89) for "X. hour. : c IPTG was added over the range 1.0 
pit to 0.5 mM (to derepress lacUVS and £ac) and grown for 
an additional 1.5 hours. " 

20 Aliguots of cells expressing the synthetic- insert 
encoded proteins together with controls (no vector, mock 
vector, and no IPTG) were lysed in SDS gel loading buffer 
and electrophoresed in 20% polyacrylamide gels containing 
SDS and urea. Duplicate gels were either silver stained 

25 (Daiichi, Tokyo, Japan) or electrotransf erred to a nylon 
matrix (Unmobilon from Millipore, Bedford, MA.) for 
western analysis using rabbit anti-BPTI polyclonal 
antibodies. 

Table 108 lists the interesting proteins seen by silver 
30 staining and western analysis of identical gels. We can 
see clearly by western analysis that IPTG- inducible 
protein species containing BPTI epitopes exist in the 
test strains which are absent from the control strains. 
In XL1-Blue <m> , the migration of this species is predomin- 
35 antly that of the unprocessed form of the pro-protein 
although a small proportion of the protein appears to 



WO 92/15679 



PCT/US92/01539 



73 

migrate at a size consistent with that of a fully 
processed form. In SEF 1 , the processed form 

predominates, there being only a faint band corresponding 
to the unprocessed species. 
5 Thus, in strain SEF' r we have produced a tripartite 
fusion protein that is specifically cleaved after the 
secretion signal sequence. We believe that the mature 
protein comprises BPTI followed by the gene VIII coat 
protein and that the coat protein moiety spans the 
10 membrane. One or more copies , perhaps hundreds of 
copies, of this protein will "co-assemble into M13 derived 
phage or M13-like phagemids. This construction will 
allow^ us to a) xroxtagenize the BPTI domain. To) display 
each of the* variants on the coat of one or more phage 
15 (one type per phage), and c) recover those phage that 
display variants having novel binding properties with 
respect to target materials of our choice. 
r -"Ra^SWeS aiid bb^ ^ ei' { RAS C 86) report that phage %>r odtidedf 
in cells that express two alleles of gene J£HI, : that have 
20 differences within the first 11 residues of the nature 
coat protein, contain some of each protein. Thus, 
because we have achieved is vivo processing of the 
phoA (signal) : ibpti: :mtureVlII fusion gene, it is highly 
likely that co- express ion of this gene with wild- type 
25 VIII will lead to production of phage bearing BPTI 
domains on their surface. Mutagenesis of the bEfii domain 
of these genes will provide a population of phage, each 
phage carrying a gene that codes for "the variant of BPTI 
displayed on the phage surface. 
30 VIII Display Phages Production, Preparation and Analysis, 
i. Phage Production. 

The OCV can be grown in XLl-Blue*™* in the absence of 
IPTG. Typically, a plaque plug is taken from a plate and 
grown in 2 ml of medium, containing freshly diluted 
35 cells, for 6 to 8 hours. Following centrifugation of 
this culture, the supernatant is titered. This is kept 
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as a phage stock for further infection, phage production, 
and display of the gene product of interest. 

A 100 -fold dilution of a fresh overnight culture of. SEF 1 
cells in 500 ml of NZCYM medium is allowed to grow to a 
5 cell, density of 0.4 (Ab 600nm) in a shaker at 37°C. To 
this culture is added a sufficient amount of the phage 
stock to give a MOI of 10 together with IPTG to give a 
final concentration of 0.5 mM. The culture is allowed to 
grow for a further 2 hrs. 
10 ii. Pfckge Prlep^ Purification. - 

The plia^ge-produfiing bacterial culture is centrlfuged to 
>V--. ~^ J ^'- '~d&pa;i£^ superaatantl^rom /the ^rcterial^ 

pellet. To the supernatant is added one quarter, by 
volume, of phage precipitation solution (20% PEGr 3 . 75 M 
15 ammonium acetate) and PMSF to a final concentration of 
IrrtM. It is left on ice for 2 hours after which the 
precipitated phage is retrieved by centrifugation. The 
piiage pellet is redissolved in TrisEDTA containing 0.1% 
Sarkosyl and left at 4°C for i hour after which any 
20 bacteria and bacterial debris is removed by centrifuga- 
tion. The phage in the supernatant is reprecipitated 
with PEG overnight at 4°C. The phage pellet is 
resuspended in LB medium and repreciptated smother two 
times to remove the detergent. The phage is stored in LB 
25 medium at 4°C, titered, and used for analysis and binding 
studies. 

A more stringent phage purification scheme involves 
centrifugation in a CsCl gradient (3.86 g of CsCl 
dissolved in NET buffer (0.1 M NaCl, lmM EDTA, 0.1M Tris 
30 pH 7.7) to make 10 ml) . 10 12 to 10 13 phage in TE Sarkosyl 
buffer are mixed with 5 ml of CsCl NET buffer and 
centrifuged overnight at 34K rpm in, for example, a 
Sorvall OTD-65B Ultracentrif uge . Aliquots of 400 fil are 
carefully removed. 5 fil aliqouts are removed from the 
35 fractions and analysed by agarose gel electrophoresis 
after heating at 65 °C for 15 minutes together with the 



WO 92/15679 



PCI7US92/4H539 



75 



10 



gel loading buffer containing 0.1% SDS. Fractions 
containing phage are pooled, the phage reprecipitated and 
finally rediesolved in LB medium to a concentration of 
10" to 10 13 phage per ml. 
Hi. Phage Analysis. 

The display phage are analyzed using standard methods 
of polyacrylamide gel electrophoresis and either silver 
staining of the gel or electrotransf er to a nylon matrix 
followed by analysis with anti-BPTI antiserum (Western 
'anaiysisTV" " T^play ~ of * heterologous proteins" & ^ 
: quantitated by"" comparison to~ serial dilutions" of ~the 
— starting" protein; v f br ex^lr BPTI . together with "T^&gr 
display phage samples in the?electrbphoresis and Western-^ 
"analyses . An alternative method involves running a 2 - ; 
15 fold serial dilution of a phage in which both the major 
coat protein and the fusion protein are silver stained. 
. Comparison of. the ratios of the two protein specie^. 
: • allows one to estimate the number ;- of fusion proteins per^ 
: = phage since the number of VIII- gene encoded proteins per- 

20 phage (-3000). is known. . 

TnrnrnoraHnn of fu y Hnr. protein into baC,t,erj.pphaqe, 

In vivo "expression of the processed BPTI:VIII fusion 
protein, encoded by vectors GemMB42 and M13MB48, 
indicated that the processed fusion product is probably 
25 located within the cell membrane. Thus, it could be 
incorporated into the phage and that the BPTI moiety 
would be displayed at the phage surface. 

SEF* cells were infected with either M13MB48 or M13mpl8, 
as control. The resulting phage were electrophoresed ■(- 
30 10" phage per lane) in a 20% polyacrylamide gel 
containing urea followed by electrotransf er to a nylon 
matrix and western analysis using anti-BPTI rabbit serum. 
A single species of protein was observed in phage derived 
from infection with the Ml 3 MB 4 8 stock phage which was not 
observed in the control infection. This protein migrated 



35 
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at an apparent size of -12 kd, consistent with that of 
the fully processed fusion protein. 

Western analysis of SEP ■ bacterial, lysate with or 
without phage infection demonstrated another species of 
5 protein of about 20kd. This species was also present, to 
a lesser degree, in phage preparations which were simply 
PEG precipitated without further purification (for 
example, using nonionic detergent or by CsCl gradient 
cent rifugat ion) . A comparison of M13MB48 phage 

10 preparations made in the presence or absence of detergeht 
aldemonstrateci that sarkosyl tfea;tment and CsCl gradient 
^ : ^rifibe^i^ 

- vt™ : ^ having nb ef f ect on pr^^nci bf the BPTI : VIII fui ion 
- - " protein." This ^ 

15 incorporated and is a constituent of the phage body. 

The time course of phage production and BPTI: VIII 
^ incorpora t i on was followed post- infection and af ter IFTG 

induction: Phage production and f usion j jprotein 

^ ^ " incorporation appear^ to; be rii^imal after twb^boursi 

20 This time course was utilized in further phage 
productions and analyses. 

Polyacrylamide electrophoresis of the phage prepara- 
tions, followed by silver staining, demonstrated that the 
preparations were essentially free of contaminating 

25 protein species and that an extra protein band was 
present in M13MB48 derived phage which was not present in 
the control phage. The size of the new protein was 
consistent with that seen by western analysis. A similar 
analysis of a serially diluted BPTI: VIII incorporated 

30 phage demonstrated that the ratio of fusion protein to 
major coat protein was typically about 1:150. Since the 
phage contains about 3000 copies of the gene VTII 
product, the phage population contains, on average, 10' s 
of copies of fusion protein per phage. 

35 Altering the initiating methionine of the natural gene 
VIII. 
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The OCV M13MB48 contains the synthetic gene encoding the 
BPTI:VIII fusion protein in the intergenic region of the 
modified M13mpl8 phage vector. The remainder of the 
vector consists of the M13 genome. To increase the phage 
5 incorporation of the fusion protein, we decided to 
diminish the production of the natural gene VTII product 
by altering the initiating methionine codon of this gene 
to CTG. In such cases, methionine is incorporated, but 
the rate of initiation is reduced. The change was 
iO "achieved "by site-specific oligonucleotide mutagenesis. 

Jl" . I - "_~ M K_. K . S. -rest of VTII, 

... .-rest Of XI- -K^^^S;is>: :S?fStOp ; ■■■■ ■ . i „ T .#* ; , ^■7^ 
15 . _ ........... - -■ ■•• . 

Site -specific mutagenesis. - - 

(L) K K S -rest of 

VTII 

20 _ ACT.TCC.AG.CTG.AAA.AAG.TCT. 

rest of "xi ; : .- t s ,s stop : .. .</' . 7 •. : '.r:; ; ; : : 

Analyses of the phage derived' f rom this modif ied vector 
indicated that there was a significant increase in ..the . 
25 ratio of fusion protein -to* major coat protein. 
Quantitative estimates indicated that within a phage 
population as much as 100 copies of the BPTI:VTII fusion 
were incorporated per phage. 

Display of BPTItVIII fusion protein by bacteriophage. 

30 The BPTI:VTII fusion protein had been shown to be 

incorporated into the body of the phage. This phage was 
analyzed further to demonstrate that the BPTI moiety was 
accessible to specific antibodies and hence displayed at 
the phage surface. 

35 We added purified polyclonal rabbit anti-BPTI IgG to a 
known titer of phage. Following incubation, protein A- 
agarose beads are added to bind IgG and left to incubate 
overnight. The IgG- protein A beads and any bound phage 
are removed by centrifugation followed by a retitering of 

40 the supernatant to determine loss of phage. The phage 
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bound to the beads can also be acid eluted and titered. 
The assay includes controls , such as a WT phage stock 
(M13mpl8) and IgG purified from normal rabbit pre- immune 
serum. 

5 Table 140 shows that while the titer of the WT phage is 
unaltered by anti-BPTI IgG, BPTI-IIIMK (positive control, 
vide infra ) , demonstrated a significant drop in titer 
with or without the extra addition of protein A beads . 
(Note, since the BPTI moiety is part of glllp that binds 

10 phage to bacterial pili, this is expected* ) Two batches 
of M13MB48 phage (containing the BPTI : VIII fysion 

^ ^ in titer,^ 

as judge^-by pfu^^^ anti-BPTI antibodies and protein-. 

A beads were ^added^-Thev- initial drop in titer with the 

15 antibody alone, differs somewhat between the two batches 
of phage . Retrieval of the immunoprecipitated phage , 
while not ^antitatd^, was significant when con^ared to 
the WT phage control. : 1 - ; 

Further controls- are shown in Table 141 and Table 142 . 

20 The data demonstrated that the loss in titer observed, for. 
the BPTI: VIII containing phage is a result of the display- 
of BPTI epitopes by these phage and the specific 
interaction with anti-BPTI antibodies- No significant 
interaction with either protein A agarose beads or IgG 

25 purified from normal rabbit serum could be demonstrated* 
The larger drop in titer for M13MB48 batch five reflects 
the higher level incorporation of the fusion protein in 
this preparation. 

Functionality of the BPTI moiety in the BPTI -VIII display 
30 phage. 

The previous two sections demonstrated that the 
BPTI: VIII fusion protein has been incorporated into the 
phage body and that the BPTI moiety is displayed at the 
phage surface. To demonstrate that the displayed 
35 molecule is functional, binding experiments were 
performed in a manner almost identical to that described 



3NSDOCID: <WQ 9215679A1 I > 



WO 92/15679 



PCT/US92/01539 



79 

in the previous section except that proteases were used 
in place of antibodies. The display phage and controls 
are allowed to interact with immobilized proteases or 
immobilized inactivated proteases. Binding is assessed 
5 by the loss in titer of the display phage or by 
determining the phage bound to the beads. 

Table 143 shows the results of an experiment in which 
BPTI.VIII display phage, M13MB48, were allowed to bind to 
anhydrotrypsin-agarose beads. There was a significant 
10 drop in titer compared--, fee* ^WT phage- (no displayed^ BPTI) i- 
A pool, of phage (5AA Pool) , each contain a variegated 5 
,-am^ 

:£nterface,ri^onstr^ a similar decline in titer. : In 
-control (table. 143) , very little non-specific binding oj?^ 
15 the display phage was observed with agarose beads 
carrying an unrelated protein (streptavidin) . 

Actual binding^ of the display phage is demonstrated by 
the data shown for two experiments in Table 144. The - 
negative control is Ml3mpl8 a^id the positive control La 
20 -BPTI-IIIMK, a phage in which the BPTI moiety, attached to 
the 'gene "ill" protein been shown to be displayed and 

functional. M13MB48 "and M13MB56 both bind to 

anhydrotrypsin beads in a msmner comparable to that of 
the positive control, being 40 to 60 times better than 
25 the negative control (non-display phage) . Hence, 
functionality of the BPTI moiety, in the major coat 
fusion protein, was established. 

Furthermore, Table 145 compares binding to active and 
inactivated trypsin by phage. The control phage, M13mpl8 
30 and BPTI- III MK, demonstrated binding similar to that 
detailed elsewhere in the present application. Note that 
the relative binding is enhanced with trypsin due to the 
apparent marked reduction in the non-specific binding of 
the WT phage to the active protease. M13.3X7 and 
35 M13.3X11, which each contain 1 EGGGS 1 linker extensions at 
the domain interface, bound to anhydrotrypsin and trypsin 
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in a manner similar to BPTI-IIIMK phage. The binding, 
relative to non- display phage, was -100-fold higher in 
the anhydrotrypsin binding assay and at least 1000 -fold 
higher in the trypsin binding assay. The binding of 
5 another ' EGGGS ' linker variant (M13.3Xd) was similar to 
that of M13.3X7. 

To demonstrate the specificity of binding the assays 
were repeated with human neutrophil elastase (HNE) beads 
and compared to that seen with trypsin beads Table 146. 
10 BPTI has "a" very- -high • ; a£fixiity----f br J -"la^p8in:^an*.:-a - low • 
affinity, for HNE, hence the BPTI display phage , should 
,~&?£f~. - - :; reflect^ t^esewaf f^iti^-when used^in binding as fa^^^i th - 
' ; ¥ these = beads. - -The ,n^ and, positive controls for,. 

.o-.- •. , trypsin binding were as already^ described above while, an. 
15 additional positive control for the HNE beads, 
BPTI (K15L,MGNG) -III MA. was included. The results, shown 
. in Table 146, confirmed this prediction. M13MB48, 
M13.3X7 and M13.3X11 phage demonstrated good binding, to 
trypsin,- relative to WT phage and the HNE control 
20 . ... . (BPTI (K15L, MGNG) - III MA) .,[. .being comparable to BPTI - IIIMK 
phage. Conversely poor binding -occurred when HNE beads, 
were used, with the exception of the HNE positive control 
phage . 

Taken together the accumulated data demonstrated that 
25 when BPTI is part of a fusion protein with the major coat 
protein of M13 phage, the molecule is both displayed at 
the surface of the phage and a significant proportion of 
it is functional in a specific protease binding manner. 

*** 

30 EXAMPLE IX 

CONSTRUCTION OF BPTI/GENE-XXI DISPLAY VECTOR 

DNA manipulations were conducted according to standard 
procedures as described in Maniatis st si*. (MANI82) or 
Sambrook st al^. (SAMB89) . First the lac? gene of M13- 
35 MB 1/2 was removed. M13-MB1/2 RF was cut with SamHI and 
gall and the large fragment was isolated. The recovered 



WO 92/15679 



PCT/US92/01539 



81 

6819 bp fragment was filled in with Klenow enzyme, 
ligated to a Hin dlll 8mer linker, and used to transfect 
XLl-Blue* 1 ** (Stratagene, La Jolla, CA) cells which were 
subsequently plated for plaque formation. RF DNA was 
5 prepared from chosen plaques and a clone, M13-MB1/2- - 
delta, containing regenerated EamHl and Sail sites and a 
new Hin di II site, all 500 bp upstream of the Bglll site 
(6935) , was picked. 
A unique Nar l site was introduced into codons 17 and 18 
10 of . gene --. rai^ -. frchanqing;&the^amino - acids from H- S -to G- A, 

Gf . TaJale-.-XlO^-Tin^Ml-Sj^l/g.^elta,;;;;--- • ^ r ^,-;,^« 

". 2~ 13_ 14 ,15 ^ .16,: 17. ? ^8.. ... 20 21 . .. ; J.. .. 

; ;; r - » - ct ttc 'tat : tct cac tec ;gct gaa ac- 3 • ; w.ild^.typ.^ 

15-. , 3! -ga^raag^ata .... .aga,^ ...cc.g,.^. egg cga . .ctt,__ tsr.-S'V 

5» -ct ttc tat tct ggc gee get gaa ac-3' mutant 
P FYS6AAET 

The presence .of a unique Nar l site at nucleotide 1630 was 
20 confirmed,.-^ the new vector ^ 

is Jtt3_^MB.i/(2 ; - del ta- Narl . Phage JMK was made by cloning 

the 1.3 Kb BamH I Km* fragment from plasmid pTJC4K 

(Pharmacia , Piscataway , NJ). into M13 -MB1/2 - del ta -Narl . 

Phage MK grows as well as wild- type M13, indicating that 
25 the changes at the cleavage site of gene III protein are 

not detectably deleterious to the phage. 

INSERTION OF SYNTHETIC BPTI GENE 

The BPTI -III expression vector was constructed by 

standard means. The synthetic b p ti - VIII fusion contains 
30 a Nar l site that comprises the last two codons of the 

BPTI -encoding region. A second Harl site was introduced 

upstream of the BPTI -encoding region by ligating the 

adaptor shown to Acc III-cut M13-MB26: 

5 • -TATTCTGGCGCCCGT -3' 
35 3 1 -ATAAGACCGCGGGCAGGCC- 5 ' 

| Narl I JaccIII 

The ligation sample was then restricted with JSSXX and a 
180 bp DNA fragment encoding BPTI was isolated. RF DNA 
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of phage MK was digested with Earl, dephosphorylated, and 
ligated to the 180 bp fragment. Ligation samples were 
used to transfect XLl-Blue 01 *^ which were plated for Kin* 
plagues . DNA, isolated from phage derived from plagues 
5 was test for hybridization to a ^P-phdsphorylated double 
stranded DNA probe corresponding to the BPTI gene* Large 
scale RF preparations were made for clones eachibiting a 
strong hybridization signal. Restriction enzyme 

digestion analysis confirmed the insertion of a single 

10 . copy ; of^the synthetic BPTI gene into crene^ III of MK to 
generate phage MK-BPTI. Subsequent DNA seguencing con- 
7 firmed that the sequence of the fcpti - III fusion gene is 
correct, and that the correct reading frame is maintained . 
Table ll 6 shows the entire coding region/ the translation 

15 into protein sequence, and the functional parts of the 
polypeptide chain. 

EXPRESSION OF THE BPTI -III FUSION GENE IN VITRO 

MK-BPTI RF DNA was added to a coupled prokaryot ic 
transcription- translation extract , (Amersham) . Newly 

20 synthesized radiolabelled proteins were produced and 
subsequently separated by electrophoresis on a 15% SDS- 
polyacrylamide gel. The MK-BPTI DNA directs the 
synthesis of an unprocessed gene III fusion protein which 
is 7 Kd larger than the WT gene III , consistent with 

25 insertion of 58 amino acids of BPTI into gene III 
protein. We immunoprecipitated radiolabelled proteins 
from the cell -free prokaryotic extract. Neither rabbit 
anti (M13-gene-VIII-protein) IgG nor normal rabbit IgG 
were able to immunoprecipitate the gene III protein 

30 encoded by either MK or MK-BPTI. However, rabbit 
anti -BPTI IgG is able to precipitate the gene III protein 
encoded by MK-BPTI but not by MK. This confirms that the 
increase in size of the III protein encoded by MK-BPTI is 
attributable to the insertion of the BPTI protein. 

35 WESTERN ANALYSIS 
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Phage were recovered from cultures by PEG precipitation. 
To remove residual cells, recovered phage were 
resuspended in a high salt buffer and centrifuged, as per 
instructions for the MUTA- GENE W M13 in vitro Mutagenesis 
5 Kit (Catalogue Number 170-3571, Bio-Rad, Richmond, CA) . 
Aliquots of phage (containing up to 40 fig of protein) 
were electrophoresed on a 12.5% SDS-urea-polyacrylamide 
gel and proteins were electro- transferred to a sheet of 
Immobilon. Western blots were developed using rabbit 
.-r^^^rc--.,.- j-q.:^-^ previously been incubated with 

-- --'-an -El cdOfi r exfcfadt f ollowed by goat .ant -rabbit - antib'CMa^ ^v^----:^^'^-; 

;-: j An' .ijcnmumqre^ 

,i, r y - p profeein _of .67t.. : Kd-- is-det ectefein,^prepai^&ions of the JMK^ x^v^-: 

^ BPIX but not^ the MK phage . - : /^e-/«ia^9f ? * ; £he~ , iamnmoi^ 

15 active protein is consistent with the predicted size of 
a processed BPTI -III fusion protein (6.4 Kd plus 60 Kd) . 
. v : These data indicate; Jthat /BPTI -specif ic epitopes are : ■ - : ; 

fcfie MK-BPTI phage- buE :; not£ tHS- 
~=*ir":IS"" : rrV; ; r MX phage* - " - — " V- " ; -Vj— -'- . - ' - -V T 

^ : 20 NEUTRALIZATION OF PHAGE TITER fWI*H AGAROSE - IMMOBILIZED /'/'' - : 
ANHYDRO -TRYPSIN 

Anhydro- trypsin is a derivative of trypsin having the 
active site serine converted to dehydroalanine . Anhydro- 
trypsin retains the specific binding of trypsin but not 
25 the protease activity. Unlike polyclonalantibodies , 
anhydro- trypsin is not expected to bind unfolded BPTI or 
incomplete fragments. 

Phage MK-BPTI and MR were diluted to a concentration 
* 1. 4 -10 12 particles per ml. in TBS buffer (PARM88) contain- 

30 ing 1.0 mg/ml BSA. 30 fil of diluted phage were added to 
2, 5, or. 10 (il of a 50% slurry of agarose- immobilized 
anhydro- trypsin (Pierce Chemical Co., Rockford, IL) in 
TBS/BSA buffer. Following incubation at 25°C, aliquots 
were removed, diluted in ice cold LB broth and titered 
35 for plaque- forming units on a lawn of XLl-Blue^ cells. 
Table 114 shows that incubation of the MK-BPTI phage with 
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immobilized anhydro- trypsin results in a very significant 
loss in titer over a four hour period while no such 
effect is observed with the MK (control) phage . The 
reduction in phage titer is also proportional to the 
5 amount of immobilized anhydro- trypsin added to the MK- 
BPTI phage . Incubation with 5 /il of a 50% slurry of 
agarose- immobilized streptavidin (Sigma, St. Louis, MO) 
in TBS/BSA buffer does not reduce the titer of either the 
MK-BPTI or IiK phage. These data are consistent with the 
10 presentation of a correctly- iolded, functional BPTI 

protein on t:he- surface : q%S^^^jt B"^^ -phage,. - but ?£lo t^ on^c^; 
the MK phage. ttnf olded, ^ .are:; 
not expected to bind anhydro- ttyps-ih. - Furthermore, 
unfolded BPTI domains are expected to be non- specif ically 
15 sticky. 

, ; , NEUTRALIZATION OF PHAGE TITER WITH ANTI -BPTI ANTIBODY 

MK-BPTI and MK phage were diluted to a concentration of , v 
4-10 8 plague -forming units per ml-inLB broth- 15 /il of 
diluted phage were added to an equivalent volume of 

20 either rabbit anti -BPTI serum or normal rabbit serum 
(both diluted 10 -fold in LB broth) . Following incubation 
at 37°C, aliquots were removed, diluted by 10 4 in ice-cold 
IB broth and titered for pfus on a lawn of XLl-Blue 0 ^ 0 . 
Incubation of the MK-BPTI phage with anti -BPTI serum 

25 results in a steady loss in titer over a two hour period 
while no such effect is observed with the MK phage* As 
expected, normal rabbit serum does not reduce the titer 
of either the MK-BPTI or the MK phage. Prior incubation 
of the anti -BPTI serum with authentic BPTI protein but 

30 not with an equivalent amount of IL. coli protein, blocks 
the ability of the serum to reduce the titer of the MK- 
BPTI phage. These data are consistent with the 
presentation of BPTI- specif ic epitopes on the surface of 
the MK-BPTI phage but not the MK phage. More specif i - 

35 cally, the data indicates that these BPTI epitopes are 
associated with the gene III protein and that association 
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of this fusion protein with an anti-BPTI antibody blocks 
its ability to mediate the infection of cells. 
NEUTRALIZATION OP PHAGE TITER WITH TRYPSIN 

MK-BPTI and MK phage were diluted to a concentration of 
5 4*10* plaque- forming units per ml in LB broth. Diluted 
phage were added to an equivalent volume of trypsin 
diluted to various concentrations in LB broth. Following 
incubation at 37°C, aliquots were removed, diluted by 10 4 
in ice cold LB broth and titered for plaque -forming units 
— 10 ^^on; a -tawn of -XL1 -Blue*™) . Incubation- of the- MK^BPTI^ 
' ^ ^hage with 0;15 /xg of trypsin results in a 70%; loss im 

i s ;fit^er:ra^ 

- observed f or J-MK phage . ^ reduction : in the an^ux^t .Qf - 

- trypsia added tp^phage^results in^a reduction in the rlosa 
15 of titer. However, at all trypsin concentrations inves- 
tigated , the MK-BPTI phage are more sensitive to incuba- 
tion with trypsin than the MK phage. Thus, association 

: of the^BPTI-IIi fusion protein displayed on the surfaces 
of the MK-BPTI phage with trypsin blocks? its ability to 
20 mediate the infection of cells — - - ^ 

The reduction in titer of phage MK by trypsin is an 



example of a phenomenon that is probably general: 
proteases, if present in sufficient quantity, will 
degrade proteins on the phage and reduce irif ectivity . 




AFFINITY SELECTION SYSTEM 



Affinity Selection with Immobilized Anhydro - Tryps in 

MK-BPTI and MK phage were diluted to a concentration of 
30 1.4 -10 12 particles per ml in TBS buffer (PABM88) 
containing 1.0 mg/ml BSA. We added 4.0 -10 10 phage to 5 pi 
of a 50% slurry of either agarose- immobilized anhydro- 
trypsin beads (Pierce Chemical Co.) or agarose- 
immobilized streptavidin beads (Sigma) in TBS /BSA. 
35 Following a 3 hour incubation at room temperature, the 
beads were pelleted by centrifugation for 30 seconds at 
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5000 rpm in a microfuge and the supernatant fraction was 
collected. The beads were washed 5 times with TBS/Tween 
buffer (PARM88) and after each wash the beads were 
pelleted by centrifugation and the supernatant was 
5 removed. Finally, beads were resuspended in elution 
buffer (0.1 N HC1 containing 1.0 mg/ml BSA adjusted to pH 
2.2 with glycine) and following a 5 minute incubation at 
room temperature, the beads were pelleted by centrifuga- 
tion. The supernatant was removed and neutralized by the 
10 addition of 1.0 M Tris-HCl buffer, pH 8.0. 

Aliqubts of phage samples ' were' applied to a Nytran 
; .v - ^^etnbrinp^dl^ Schueli -fKeene, ~ NBm 

'-- '■> tj^tt^^^^M^T^'aixul^^^g^^^ was- immdbilized -onto ^ 
the J^tran :; by bSing at fl0°C for 2 hours. The baked 
15 filter was incubated at 42 °C for 1 hour in pre -wash 
solution (MANI82) and pre -hybridization solution (5Prime- 
::3Prime, West Chester, PA) . The 1 . 0 Kb ^lasxl ..(base 
1630) /Sml (base 2646) DNA fragment from -MK RF was 
radioactively labelled with 32 P-dCTP using an oligolabell- 
20 ing kit" (Pharmacia T Piscataway, NJ) . The .radioactive 
probe was added to the Nytran filter in hybridization 
solution (5Prime-3Prime) and, following overnight incuba- 
tion at 42 °C, the filter was washed and autoradiographed. 
The efficiency of this affinity selection system can be 
25 semi- quantitatively determined using a dot-blot 
procedure. Exposure of MK-BPTI- phage -treated anhydro- 
trypsin beads to elution buffer releases bound MK-BPTI 
phage. Streptavidin beads do not retain phage MK-BPTI. 
Anhydro- trypsin beads do not retain phage MK. In the 
30 experiment depicted in Table 115, we estimate that 20% of 
the total MK-BPTI phage were bound to 5 pi of the 
immobilized anhydro -trypsin and were subsequently 
recovered by washing the beads with elution buffer (pH 
2.2 HC1 /glycine ) . Under the same conditions, no 
35 detectable MK-BPTI phage were bound and subsequently 
recovered from the streptavidin beads . The amount of MK- 
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BPTI phage recovered in the elution fraction is 
proportional to the amount of immobilized anhydro- trypsin 
added to the phage. No detectable MK phage were bound to 
either the immobilized anhydro- trypsin or streptavidin 
5 beads and no phage were recovered with elution buffer. 
These data indicate that the affinity selection system 
described above can be utilized to select for phage 
displaying a specific folded protein (in this case, 
BPTI). Unfolded or incomplete BPTI domains are not 
.....10 expected -to b ind anhydro - 1 ryp sin ♦ ,,....■>,., ^ ,. .,^ T 

: i^ r BPTl^and ^. r ^^ge v ^t^4i^||^- <?<>- # ,conq.entr^|Jc|n^of_.. t 
11Q , ° particles p e r ml in Tris.,buf f ered saline, solution 
(PAIwaaL^ntain^g 1..0 mg/ml. BSA. Two-10* phage were^ 
15 - added to 2.5 /ig of either biotinylated rabbit ant i -BPTI 
IgG in TBS /BSA or biotinylated rabbit anti-mouse antibody 
IgG (Sigma) in TBS /BSA, and incubated overnight at 4°C. 
a 50% slurry,, of?;, st rep ; t^ 

three times -with ^TBS buf f er- .prior to incubation with 30^ 

20 - mg/ml BSA .in TBS buffer for 60 minutes at room tempera- 
ture, was washed three times with TBS/Tween buffer 
(PARM88) and resuspended to a final concentration of 50% 
in this buffer. Samples containing phage and 

biotinylated IgG were diluted with TBS/Tween prior to the 

25 addition of streptavidin -agarose in TBS/Tween buffer. 
Following a 60 minute incubation at room temperature, 
streptavidin-agarose beads were pelleted by centrifu- 
gation for 30 seconds and the supernatant fraction was 
collected. The beads were washed 5 times with TBS/Tween 

30 buffer and after each wash, the beads were pelleted by 
centrifugation and the supernatant was removed. Finally, 
the streptavidin-agarose beads were resuspended in 
elution buffer (0.1 N HC1 containing 1.0 mg/ml BSA 
adjusted to pH 2.2 with glycine), incubated 5 minute at 

35 room temperature, and pelleted by centrifugation. The 
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supernatant was removed and neutralized by the addition 
of 1.0 M Tris-HCl buffer, pH 8.0. 

Aliquots of phage samples were applied to a Nytran 
membrane using a Schleicker and Schuell minifold appar- 
5 atus. Phage DNA was immobilized onto the Nytran by 
baking at 80 °C for 2 hours. Filters were washed for 60 
minutes in pre- wash solution (MANI82) at 42 °C then 
incubated at 42 °C for 60 minutes in Southern pre-hybri- 
dization solution (5 Prime -3 Prime) . The 1.0 Kb Had 
10 (1630bp) /XmnI (2646 bp) DNA. fragment from MK RF was 
: radioactively labelled with 32 P-adCTP using an oligo- : 
a : ^ -NIX). - - >„v m lft|^rasM( 

^ ™ - v- -•-.-membranes/ - ^iaBr^ t ^[t^^£€^e^^^^cm pre -hybridization^ 
i,;yr; - solution to c Southern .hybridization solution (SPrime-rr 

15 3 Prime) at 42 °C. The radioactive probe was added to the 
hybridization solution and following overnight incubation 
at 42 0 C, the. filter-was washed, ,X..1:iine& yrith 2 x SSC, 0 .1% 
SDS at room temperature and once at 65° C in 2 x SSC, 0.1% 
- v SDS^ Nytxan membranes were subjected to autoradiography.: 

20, The efficiency of the affinity,, selection . system can fee 
semi -quantitatively determined using the above dot blot 
procedure. Comparison of dots Al and Bl or CI and Dl 
indicates that the majority of phage did not stick to the 
streptavidin- agarose beads. Washing with TBS/Tween 
25 buffer removes the majority of phage which are non- 
- specifically associated with streptavidin beads. 
Exposure of the streptavidin beads to elution buffer 
releases bound phage only in the case of MK-BPTI phage 
which have previously been incubated with biotinylated 
30 rabbit anti-BPTI IgG. This data indicates that the 
affinity selection system described above can be utilized 
to select for phage displaying a specific antigen (in 
this case BPTI) . We estimate an enrichment factor of at 
least 40 fold based on the calculation 

35 

Percent MK-BPTI phage recovered 
Enrichment Factor = 
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Percent MK phage recovered 



EXAMPLE XXX 
BPTXxVXXX BODKDAItY EXTENSIONS. 

To increase the flexibility between BPTI and mature 
gVIllp, we introduced codons for peptide extensions 
between these domains. 

mum P^r,°^n fl t-.o thP •fuwion proton Interface. 

The M13 gene ill product contains 'stalk- like' regions - 
as implied by electron micrographic visualization of the 
bacteriophage (L0PE85) . The predicted amino acid 
sequence of this protein contains repeating motifs, which 
include : 

15 glu.gly.gly.gly-ser (EGGGS) seven times 
gly.gly.gly.ser (GGGS) three times 
giu.giy.giy-giy-tbx (egggt) once. 

The aim of this section was to insert, at the domain 
interface, multiple unit extensions which would mirror 
20 the repeating motifs observed in the III gene product. 

Two synthetic oligonucleotides were synthesized. We 
picked the third base of these codons so that translation 
of the oligonucleotide in the opposite direction would 
yield SER. When annealed the synthetic oligonucleotides 
25 give the following unit duplex sequence (an EGGGS 
linker) : 

EGGGS 
5» C.GAG.GGA.GGA.GGA.TC 3' 
3« TC.CCT.CCT.CCT.AGG.C 5' 
(L) (S) (S) (S) (G) 
The duplex has a common two base pair 5' overhang (GO 
at either end of the linker which allows for both the 
ligation of multiple units and the ability to clone into 
the unique Earl recognition sequence present in OCV's 
M13MB48 and Gem MB42. This site is positioned within 1 
codon of the DNA encoding the interface. The cloning of 
an EGGGS linker (or multiple linker) into the vector Karl 



30 



35 
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site destroys this recognition sequence. Insertion of 
the EGGGS linker in reverse orientation leads to 
insertion of GSSSL into the fusion protein. 

Addition of a single EGGGS linker at the Earl site of 
the gene shown in Table 113 leads to the following gene: 

79 80 80a 80b 80c 80d 80e 81 82 83 84 
GGEGGGSAAEG 

GGT . GGC . GAG . GGA . GGA . GGA . TCC . GCC . GCT . GAA . GGT 



Note that there is no preselection for the orientation 
of tne linker (s) inserted into the OCV and that multiple 
- 15 linkers of either orientation (with the predicted EGGGS 
or GSSSL amino acid sequence) or a mixture of 
orientations (inverted repeats of DNA) could occur. 

A ladder of increasingly large multiple linkers was 
established by annealing and ligating the two starting 
20 oligonucleotides containing different proportions of 5' 
phosphorylated and non-phosphorylated ends. The logic 
behind this is that ligation proceeds from the 3* unphos- 
phorylated end of an oligonucleotide to the 5 1 phosphor- 
ylated end of another. The use of a mixture of phosphor- 
25 ylated and non-phosphorylated oligonucleotides allows for 
an element of control over the extent of multiple linker 
formation. A ladder showing a range of insert sizes was 
readily detected by agarose gel electrophoresis spanning 
15 bp (l unit duplex-5 amino acids) to greater than 600 
30 base pairs (40 ligated linkers-200 amino acids) . 

Large inverted repeats can lead to genetic instability. 
Thus we chose to remove them, prior to ligation into the 
OCV, by digesting the population of multiple linkers with 
the restriction enzymes Acc lli or Xho l. since the 
35 linkers, when ligated 'head- to-head 1 or ' tail -to -tail » , 
generate these recognition sequences. Such a digestion 
significantly reduces the range in sizes of the multiple 
linkers to between l and 8 linker units ( i.e. between 5 
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and 40 amino acids in steps of 5) , as assessed by agarose 
gel electrophoresis. 

The linkers were ligated (as a pool of different insert 
sizes or as gel-purified discrete fragments) into Uarl 
5 cleaved OCVs M13MB48 or 6emMB42 using standard methods. 
Following ligation the restriction enzyme fiarl was added 
to remove the self -ligating starting OCV (since linker 
insertion destroys the Nar l recognition sequence) . This 
mixture was used to transform XL-1 blue cells and appro- 

10"- priately -plated for -plaques (OCV M13MB48) ~or ampicillin 
resistant colonies _ (0CV-rGemMB42) . — 

The vfcransf ormants. were screened using dot blpjt _ DNA : , 
analysis with one of two 3J P labeled oligonucleotide 
probes. One probe consisted of a sequence complementary 

15 to the DNA encoding the PI loop of BPTI while the second 
had a sequence complementary to the DMA encoding the 
domain interface region. Suitable linker candidates 
would probe positively with the first probe and 
negatively or poorly with the second. Plaque purified 

20 clones were used to generate phage stocks for binding 
analyses and BPTI display while the Rf DNA derived from 
phage infected cells was used for restriction enzyme 
analysis and sequencing* Representative insert sequences 
of selected clones analyzed are as follows: 

25 

M13 . 3X4 (GG) C . GGA.TCC . TCC . TCC . CT (C . GCC) 

gly ser ser ser leu 

M13.3X7 (G C.GAG.GGA.GGA.GGA.TC(C.GCC) 

30 glu gly gly gly ser 

M13.3X11 (GG) C. GAG. GGA.GGA. GGA.TCC .GGA.TCC. TCC. 

glu gly gly gly ser gly ser ser 

35 TCC,CTC.GGA.TCC.TCC.TCC.CT(C.GCCC) 

ser leu gly ser ser ser leu 

These highly flexible oligomeric linkers are believed to 
be useful in joining a binding domain to the major coat 
40 (gene VIII) protein of filamentous phage to facilitate 
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the display of the binding domain on the phage surface. 
They may also be useful in the construction of chimeric 
OSPs for other genetic packages as well. 

5 Xncozporation of interdomain extension fusion proteins 
into phage. 

A phage pool containing a variegated pentapeptide 
extension at the BPT1 : coat protein interface was used to 
infect SEP 1 cells. Using the criteria of the previous 
10 section, we determined that extended fusion proteins were 
incorporated into phage. Gel electrophoresis of the 
generated phage/ followed by silver staining or western 
analysis with anti-BPTI" rabbit ^ 

proteins that migrated similarly to, but disceraably 

15 slower than, the starting fusion protein. 

With regard to the 1 EGGGS linker 1 extensions of the 
domain interface, individual phage stocks predicted to 
contain one or more 5 - amino - acid unit extensions were 
analyzed in a similar fashion. The migration of the 

20 extended fusion proteins were readily distinguishable 
from the parent fusion protein when viewed by western 
analysis or silver staining. Those clones analyzed in 
more detail included M13.3X4 (which contains a single 
inverted EGGGS linker with a predicted amino acid 

25 sequence of GSSSL) , M13 . 3X7 (which contains a correctly 
orientated linker with a predicted amino acid sequence of 
EGGGS), M13.3X11 (which contains 3 linkers with an 
inversion and a predicted amino acid sequence for the 
extension of EGGGSGSSSLGSSSL) and M13.3Xd which contains 

30 an extension consisting of at least 5 linkers or 25 amino 
acids . 

The extended fusion proteins were all incorporated into 
phage at high levels (on average 10 ! s of copies per phage 
were present and when analyzed by gel electrophoresis 
35 migrated rates consistent with the predicted size of the 
extension. Clones M13.3X4 and M13.3X7 migrated at a 
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position very similar to but discernably different from 
the parent fusion protein, while Ml 3. 3X11 and M13.3Xd 
were ma r ke dly larger . 

5 

EXAMPLE XV 
Peptide phage 

The following materials and methods were used in the 
10 examples which follow. 

1. Peptide Phage 

HPQ6, a^ putative disulfide -bonded mini-protein, wag 
displayed on M13 phage as an insert in the gene III 
protein (glllp) . M13 has about five copies of glllp per 

15 virion. The phage were constructed by standard methods. 
HPQ6 includes the sequence CHPQFPRC characteristic of 
Devlin's streptavidin- binding E peptide (DEVL90) , as well 
as a F.Xft recognition site (see Table 820) . HPQ6 phage 
were, shown to bind to streptavidin. - - 

20 An unrelated display phage with no affinity for 
streptavidin, MKTN, was used as a control. 

2. Streptavidin, 

Commercially available immobilized to agarose beads 
(Pierce). Streptavidin (StrAv) immobilized to 6% beaded 
25 agarose at a concentration of 1 to 2 mg per ml gel, 
provided as a 50% slurry. Also available as free protein 
(Pierce) with a specific activity of 14.6 units per mg (1 
unit will bind 1 fig of biotin) . A stock solution of 1 mg 
per ml in PBS containing 0.01% azide is made. 

30 3. D- Biotin. 

Commercially available (Boehringer Mannheim) in 
crystallized form. A stock solution of 4 xrM is made. 

4. Streptavidin coating of microtiter well plates. 

Immulon (#2 or #4) strips or plates are used. 100/iL of 
35 StrAv stock is added to each 250 /iL capacity well and 
incubated overnight at 4°C. The stock is removed and 
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replaced with 250 pi. of PBS containing BSA at a 
concentration of 1 mg per mL and left at 4«C for a 
further l hour. Prior to use in a phage binding assay 
the wells are washed rapidly 5 times with 250 /iL of PBS 
5 containing 0.1% Tween. 
5. Binding Assays. 
Assay. 

Between 10 and 20 jiL of the StrAv bead slurry (5 to 10 
fib bead volume) is washed 3 times with binding buffer 

10 (TBS containing BSA at a concentration of 1 mg per mL) 
just prior to the binding assay. 50 to 100/0* of binding 
buffer containing, control or peptide -display phage ( .10" r 
to 10" total plaque forming units - pfu's) is added to 
each microtube. Binding is allowed to proceed for 1 hour 

15 at room temperature using an end over end rotator. The 
beads are briefly centrifuged and the supernatant 
removed. The beads are washed a further 5 times with l 
mL of TBS containing 0.1% Tween, each wash consisting of 
a 5 min incubation and a brief centrifugation. Finally 

20 the bound phage are eluted from the StrAv beads by a 10 
min incubation with pH 2 citrate buffer containing 1 mg 
per mL BSA which is subsequently neutralized with 260 liL 
of 1M trie pH 8 . The number of phage present in each 
step is determined as plaque forming units (pfu's) 

25 following appropriate dilutions and plating in a lawn of 
F* containing E. coll. 
pl ? i-^ Assay. 

To each StrAv- coated well is added 100 fiL of binding 
buffer (PBS with l mg per mL BSA) containing a known 
30 quantity of phage (between 10» and 10 u pfu's) . Incubation 
proceeds for 1 hr at room temperature followed by removal 
of the non-bound phage and 10 rapid washes with PBS 0.1% 
Tween. The bound phage are eluted with 250 /iL of pH2 
citrate buffer containing l mg per mL BSA and 
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neutralization with 60 fiL of 1M tris pH 8. The number of 
phage present in each step is determined as plaque 
forming units (pfu's) following appropriate dilutions and 
plating in a lawn of F' containing F. coll. 

5 EXAMPLE V 

Effect of Dithiothreitol (DTT) 
on display phage binding to streptavidin- agarose. 

Preliminary control experiments. 

a. Use of HRP -conjugated biotin and streptavidin beads. 
10 Binding capacity of StrAv agarose beads for HRP- 

conjugated biotin determined to be ~ 1 fig (equivalent to 
^ 3^50 pmol ^ biotin ) r per : 5 beacls (the amount uied iix A 
these experiments) . ' 7 "'- '- ••--•^ - ^ 

b. Effect of DTT on HRP- conjugated biotin binding to 
15 StrAv beads. 

5 /XL of StrAv beads were incubated with 10 ng of HRP- 
biotin in binding buffer (TBS-BSA) in the presence of 
varying amounts of DTT (at least 99% reduced) . Following 
a 15 minute incubation at room temperature, the beads 

20 were washed two times in binding buffer and an HRP 
substrate added. Color development was allowed to 
proceed and noted in a semi -quantitative manner. Table 
827 shows that the binding of biotinylated horseradish 
peroxidase (HRP) is not greatly affected by 

25 concentrations of DTT below 20 xrM. tf.B. DTT 
concentrations of 20 and 50 rriM also inhibited the 
interaction of HRP and substrate in the absence of StrAv 
beads hence having a general negative effect in this 
system. 
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Effect of dtt on p png display phage infectivitV* 

10* pfus of HPQ6 were added to binding buffer (TBS-BSA) 
in the presence of different concentrations of DTT. 
Incubated at room temperature for 1 hour then diluted and 
5 plated to determine titer as pfus. Table 828 show the 
effect of DTT on the infectivity of phage HPQ6. Hence, 
either DTT has no effect on phage inf ectivities over this 
range of concentrations or the effects are reversed on 
dilution of the phage . Prom these control experiments it 

10 is apparent that DTT can be used at concentrations below 
10 mM in studies on the effect of reducing agents on 
peptide display phage binding to StrAv. 

Table 829 shows the effect of DTT on the binding of 
phage HPQ6 and MKTN to StrAv beads. The most significant 

15 effect of DTT on HPQ6 binding to StrAv occurred between 
0.1 and 1.0 mM DTT, a concentration at which no negative 
effects were observed in the preliminary control 
experiments. These results strongly indicate that, in 
the case of HPQ6 display phage, DTT has a marked effect 

20 on binding to StrAv and that the presence of a disulfide 
bridge within the displayed peptide is a requirement for 
good binding. 

EXAMPLE VI 

RELEASE OP STREPTAVTDIN-BOOND DISPLAY PHAGE 
25 BY FACTOR Xa CLEAVAGE 

Phage HPQ6 contains a bovine F.X. recognition site 

f YTEGR/ IV) . In many instances, IEGR is sufficient 

recognition site for F.X., but we have extended the site 

in each direction to facilitate efficient cleavage. The 

30 effect of preincubating HPQ6 phage with F.X. on binding to 
StrAv beads is shown in Table 832. Thus while this 
concentration of F.X, (2.5 units) had no measurable effect 
on the titer of the treated display phage it had a very 
marked effect on the ability of the treated display phage 

35 to bind to StrAv. This is consistent with the StrAv 
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recognition sequence being removed by the action of FX, 
recognizing and cleaving the YIEGR/IV sequence. 

Table 833 shows the effect of FX, treatment of HPQ6 
following binding to StrAv. Is it possible to remove 
5 display phage bound to their target by the use of FX, in 
place of pH or chaotropic agent elution? HPQ6 display 
phage were allowed to bind to StrAv then incubated either 
in FX, buffer or the same buffer together with 2.5 units 
of FX, for 3 hrs. The amount eluted was compared to the 

10 total number of phage bound as judged by a pH2 elution. 
Therefore, while the display phage are slowly removed in 
the buffer alone, the presence of FX, significantly " 
increases this rate. ^~ - - 

The removal of HPQ6 display phage from StrAv by FX, was 

15 also studied as a function of the amount of enzyme added 
and the time of incubation, as shown in Table 834. N.B. 
at greater concentrations of the enzyme (1.2 U for 1 hour 
or 2.5 U for 2 hours), a loss in infectivity of the 
treated phage was noted as measured by pfus. 

20 
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Table 10: Abundances obtained 
from various vgCodons 



A. Optimized fxS Codon, Restrained by [D] + [E] «• [K] + [R] 

5 





T C A 


G 




1 1 


.26 .18 .26 


.30 


f 


2 1 


.22 .16 .40 


.22 _ 


- X 


3 1 


.5 *0 ~ .0 


- .5 ' ^ 


■ s — ' • 






Amino 






Abundance 




Abundance 


A 


4.80% 


C 


2.86% 


O 


6.00% 


E 


6.00% 


F 


2.86% 


6 


6.60% 


H 


3.60% 


I 


2.86% 


K 


5.20% 


L 


6.82% 


M 


2.86% 


N 


5.20% 


P 


2.88% 


Q 


3.60% 


R 


6.82% 


8 . . 


7.02% mfaa 


T 


4.16% 


V 


6.60% 


W 


2.86% lfaa 


y 


5.20% 




5.20% 
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tD] + 


f*^^ r«»^ r««i 

[E] m [K] + [R] - 


.12 




ratio 


•* ADun ; /ADUI1 (S J 


™ 0 . 4074 




i 






stop- free 


i 


2.454 


.4074 


.9480 


2 


6.025 


.1660 


.8987 


3 


14.788 


•0676 ... 


. 8520 


4 


36.298 


.0275 


__. - .8077, 


5 


89.095 


.0112 


.7657 


6 


218.7 


4.57-10" 3 


.7258 


7 


536.8 


1.86-10- 8 


.6881 
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Table 10: Abundances obtained 
from various vgCodon 
(continued) 

5 B. Unrestrained, optimized 







T 




A 


6 


1 


1 


.27 


.19 


.27 


.27 


2 


1 


.21 


.15 


.43 


.21 


3 


1 


.5 


.0 


.0 


.5 



Amino Amino 



15 



20 



, acid .. 


Abundance 


acid. 


Abundance 


A 


4.05% 


C 


2.84% 


D 


5.81% 


E 


5.81% 


F 


2.84% 


G 


5.67% 


H 


4.08% 


I 


2.84% 


K 


5.81% 


ti 


6.83% 


M 


2 . 84% 


N 


5.81% 


P 


2.85% 


Q 


4.08% 


R 


6.83% 


9 


6.89% mfaa 


T 


4.05% 


V 


5.67% 


W 


2.84% lfaa 


Y 


5.81% 


f?t9P 


5.81% 







25 
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[D] + [E] - 0.1162 [K] + [R] - 0.1264 



ratio - Abun(W)/Abun(S) - 0.41176 



A (1 /ratio) * 

1 2.4286 

2 5.8981 
10 3 14.3241 

4 34.7875 

5 84.4849 

6 205.180 

7 498.3 



(ratio) J 
.41176 
.16955 

.. .06981 
.02875 
.011836 •' 
.004874 
2.007-10- 3 



atop- free 
.9419 
.8872 
.8356 
.7871 
.74135 
.69828 
.6577 



15 
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Table 10: Abundances obtained 
from various vgCodon 
(continued) 



C. Optimized NNT 



1 | -.2071 .2929 .2071 .2929 

2 | .2929 .2071 .2929 .2071 
10 3 | 1. .0 .0 .0 



Amino Amino 



15 



, acid 


Abundance 


acid 


Abundance 


A 


6.06% 


C 


4.29% lfaa 


D 


8.58% 


E 


none 


P 


6.06% 


6 


6.06% 


H 


8.58% 


I 


6.06% 


K 


none 


L 


8.58% .. 


M 


none 


N 


6.06% 


P 


6.06% 


Q 


none 


R 


6.06% 


s 


8.58% mfaa 


T 


4.29% lfaa 


V 


8.58% 


W 


none 


Y 


6.06% 


StQP 


none 
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1 (i /ratio) i ( ratio) J Btpp-£K~ee 

1 2.0 .5 1. 

2 4.0 .25 . 1. 
5 3 8.0 .125 1. 

4 16.0 .0625 1. 

5 32.0 .03125 1. 

6 64.0 .015625 1. 

7 128.0 .0078125 1. 
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Table 10 : Abundances obtained 
from various vgCodon 
(continued) 



5 

D. Optimized NNG 









T 


C 


h 






1 


1 


.23 


.21 


.23 


.33 


10 


2 


I 


.215 


.285 


.285 


.215 




3 


1 


.0 


.0 


.0 


1.0 




Amino 








Amino 



15 



20 



25 



acid 


Abundance 


acid 


Abundance 


A 


9.40% 


C 


none 


D 


none 


E 


9.40% 


F 


none 


G 


7.10% 


H 


none 


I 


none 


K 


6.60% 


li . 


9.50% mfaa 


M 


4.90% 


N 


none 


P 


6.00% 


Q 


6.00% 


R 


9.50% 


S 


6.60% 


T 


6.6 % 


V 


7.10% 


W 


A .90% lfaa 


Y 


none 


stop 


6.60% 
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1 1.9388 

2 3.7588 

3 7.2876 

4 14.1289 

5 27.3929 

6 53.109 

7 102.96 



.51579 
.26604 
.13722 
.07078 
3.65-10* 
1.88-10" 2 
9 . 72 • lO" 3 



atop- free 
0.934 
0.8723 
0.8148 
0.7610 
0.7108 
0.6639 
0.6200 
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Table 10: Abundances obtained 
from optimum vgCodon 
(continued) 

5 

E. Unoptimized NNS (NNK gives identical distribution) 







T 


c 


A 


G 


1 


1 


.25 


.25 


.25 


.25 


2 


1 


.25 


.25 


.25 


.25 


3 


1 


.0 


.5 


.0 


0.5 



15 

Amino Amino 



20 



25 



acid 


Abundance 


acid 


Abundance 


A 


6.25% 


C 


3.125% 


D 


3.125% 


E 


3.125% 


F 


3.125% 


G 


6.25% 


H 


3.125% 


X 


3.125% 


K 


3.125% 


L 


9.375% 


M 


3.125% 


N 


3.125% 


P 


6.25% 


Q 


3.125% 


R 


9.375% 


S 


9.375% 


T 


6.25% 


V 


6.25% 


W 


3.125% 


y 


3.125% 
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stop 



3.125% 



10 



i 

i 

2 
3 
4 
5 
6 
7 



(l/ratiQ) j 

3.0 
9.0 
27.0 
81.0. 
243.0 
729.0 
2187.0 



< ratio! i 

.33333 

. 11111 

.03704 

.01234567 

. 0041152 
1.37* lO* 
4.57-10"* 



stop- free 
.96875 
.9385 
.90915 
.880,7 
. 8532 

.82655 

.8007 
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Table 102b : Annotated Sequence of gene 
after insertion of SslX linker 



5 nucleotide 

number 

5 • - (GGATCC TCTAGA GTC) GGC- 3 
from pGEM polylinker 



10 



15 



25 



t±taea CTTTATGCTTCOGGCTCS £afcaa£ 6TGTG6- 39 
-35 lacUV5 -10 



a ATTGTGAGPnoTr AC-AATT- 59 

lacO-symm operator 



20 oaactc AGAGG CttaCT- 77 

Sad Shine-Dalgarno seg. 



|fM | K | K | S | L | V | L | K | A | S | 
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 
| ATG | AAG | AAA | TCT | CTG | GTT | CTT | AAG | GCT | AGC | - 107 
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I Afl Til Nhe I I 



| V | A | V | A | T | L | V | P | M | L | 
| 11| 12| 13| 14| 15| 16| 17| 18| 19 | 20| 
| GTT 1 6CT | GTC | GCG | ACC | CTG | GTA | CCT | ATS | TTG | 137 
I Nru ll I KPB I | 



10 | S | F | A | R | P | D | F I C | L | E | 

| 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 
|TCC|TTC|GCT|CGT|CCG|GAT|TTC|TGT|CTC|GAG| - 167 

t IacciiiI I Ava J, I 

M13/BPTI Jnct: I yho T I 

| P | P | Y | T | G 1 P | C | K | A 1 R | 

. | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 
|ocA|ocA|TAC|Acr|oaa|occ|Tac|AAA|aco|ooc|. 197 

20 I Pf 1M 1 L I I IbbsH III 

1 Ana I I 1 

I Pra 11 1 
I PBS I I 
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Table 102b : Annotated Sequence 
of gene after insertion of Sail linker 
(continued) 

' 5 

| I | I I R I I I F I Y I » U | K | A | 
| 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 1 50 | 
|ATC|ATC|CGC|TAT|TTC|TAC|AAT|GCT|AAA|GC | - 226 

10 

| G | t | C | Q | T J F | V | Y | G | 6 | 
| 51| 52.| 53| 54| 55 1 56 1 57| 58 1 59 | 60 | 
A| GGC | CTG | TGC | CAG| ACC | TTT | GTA| TAC | GGT | GGT | - 257 
I Sfcn I' 1 ACC I I 

15 I Xca I | 



| C | R | A | K | R | N | N | P | K | 
| 61| 62| 63| 64| 65 | 66 | 67| 68 | 69 | 
| TGC | CGT | GCT | AAG | CGT | AAC | AAC|TTT | AAA| - 
I gSP X L 

| S | A | E | D | C | M | R j T | C | G | 

| 70| 71| 72| 73| 74| 75| 76| 77| 78 | 79 | 
| TCG | GCC | GAA | GAT | TGC | ATG | CGT J ACC | TGC | GGT | - 

Ixtnallll \ Spft J\ 
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BPTI/M13 boundary 

vl 

I G I A I A I E I G I D I D I P I A I K I A I A I 
| 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 
|GGC|GCC|GCT|GAA|GGT|GAT|GAT|CCG|GCC|AAG|GCG|GCC| - 350 

| Bbe I I J S£1_I ! _l 



10 | P | N | S | L | Q | A | S | A | T | 

| 92 | 93 | 94 1 95 | 96 1 i>7 1 98 | 99|l00| 
|TTC|AAT|TCT|Cn3|CAA|GCT|TCT|GCT|ACC| - 377 

I Hind 31 



15 



20 



| E | Y | I | G | Y | A | W | 
| 101 | 102 | 103 | 104 | 105 | 106 | 107 | 

| GAG | TAT | ATT | GGT | TAC | GCG | TGG | - 398 

| A | M | V | V | V | 1 | V | G | A | 
| 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 
| GCC | ATG | GTG | GTG | GTT | ATC | GTT | GGT | GCT | - 425 
I BstX I L 
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Table 102b: Annotated Sequence 
after insertion of Sail linker 
(continued) 



10 



15 



20 



25 



| T | I | G | I | 
| 117 | 116 | 119 | 120 | 

|ACC|ATC |GGG|ATC| - 437 



| K [ L [ F | K | E | F | T | S | E | A | 
1 121 1 122 1 123 1 124 1 125 1 126 1 127 1 128 1 129 1 130 | 
| AAA | CTG | TTC ] AAG j AAG | TTT | ACT | TCG | AAG | GCG | - 467 

| Aeu III 

I S | . | . I . | 

| 131 | 132 | 133 | 134 | 

•J TCT j TAA | TGA | TAG | GGTTACC - 486 

BstB II 



AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT- 521 
terminator ; 



aTCGA GACctgca GGTCGACC ggcatgc-3* 

I Sal* I 
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Mote the following enzyme equivalences, 

Xma III - Eas I has. HI - S&fitf II 

DiS II - EGQQ109 I Asu II - £s££ I 

5 
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rH 



rH 



C4 



m 

■8 

•H 

JJ 
Pi 
01 
Oi 



CO 

© 
rH 

rH 

I 



rH 


GO 
rH 


en 
rH 


(>181 


cn 

r-l 


(>17! 


* 
• 

Ml 
> 


ST) 


• 
■ 
• 

Ql 
PI 


in 

rH 

• 


• 

Ql 
Ql 


• 


a 
• 


• 
• 


• 


• 
• 






MI 


°i 


PI 


© 




M 


Hi 






rn 


HI 


Qi 


«l 




H 


pa 


►41 


cd 


r* 

iH 


> 


r i 
W 


cd 


3! 


cd 


f4 
\ 


cd 




cd 


cd 


cd 


cd 


tn 


m 


cd 


cd 


cd 


cd 


•4 


r-f 


a 


4-1 


6 


cd 


si 


0> 


4-1 


4-4 


4-1 


JJ 


cd 


cd 


> 


cd 


rH 


m 


m 


09 


CD 


CO 


f> 


03 


JJ 


Pi 


D 1 


rH~ 


>i 


>% 


rH 


rH 


rH 


Pi 


cd 


tJl 


i-I 


cd 


JJ 


44 


4H 


6 


6 


6 


JJ 


m 


cd 


U 


m 


CQ 


Pi 


Pi 


Pi 


Pj 


Pj 


<u 


M-l 


> 


4-1 


6 


cn 


> 


> 


> 


> 


■v.-r* 


i-4 


6 


rH 


cd 


> 


rH 


> 


> 


rH 


rH 


rH 


iH 


e 


r-i 


cd 






rH 


rH 


JJ 


JJ 


JJ 


Pi 


jj 


cd 


4-t 


cd 


> 


Pi 


Pi 


cd 




cd 


f-l 


JJ 


Pi 


4H 


cd 


cd 


•H 


-H 


> 


> 


> 


iH 


rH 


> 


Pi 


> 




cd 


id 


cd 


cd 


cd 


cd 


cd 


-H 


-rl 


cd 


rH 


4-t 


4-1 


> 


> 


> 


rH 


in 


> 


rH 


> 


> 


rH 


rH 


a 


m 


CD 


cd 


rH 


cd 


cd 


cd 


rH 


rH 


rH 


cd 


cd 


cd 


-H 


rd 


i-H 


> 


rH 


14 


Kl 


14 


14 


14 


rH 


JJ 


rH 


•H 


(4 






141 


M 






iH 


CD 


-H 


a 




rH 


< 


2 


S 


> 


> 


> 


D 1 


(4 


pji 


m 


*4 


14 










rH 


«I 


< 


M 


o 


an 


s 






CO 


CO 


TO 


S 


O 


S 


H 


♦a 








14 


14 


14 




W 


S 


CO 

S 


M 








14 
S 


14 
S 


14 
S 



14 

S 



S 

s 



O rH 



|X1 M4 

rH Ql 

i 8 



id 

rH 

01 



ft H 



? H 

H H 

8 & 



E 



03 

H H 
H H 



in 



in 

H 
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Table 107: in vitro transcription/ translation 
analysis of vector- encoded 
signal: :BPTI: : mature VIII protein species 

5 31 kd species' 14.5 Kfl Species". 



No DNA (control) 






pGEN-3Zf (-) 


+ 




pGEM-MB16 


+ 




pGEM-MB20 


+ 


+ 


pG EM -MB 2 6 


+ 


+ 


pGEM-MB42 


+ 


+ 


pGEM-MB46 


ND 


ND 



Notes: 

15 a.) pre -beta -lactamase, encoded by the afflB gene. 

b. ) pre-BPTI/VIII peptides encoded by the synthetic 
gene and derived constructs . 

c. ) - for absence of product; + for presence of 
product; ND for Not Determined* 
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Table 108: Western analysis* of An v±JE2 
expressed 

signal: :BPTI: : mature VIII protein species 

5 A) expression in strain XLl-Blue 

signal 14.5 kd species" 12Js£_SB££iefl c _ 

pGEM-3Zf (-) - - d - 

pGEM-MB16 VTII 

pGEM-MB20 VTII ++ 

10 pGEM-MB26 VTII +++ +/- 

pGEM-MB42 phoA ++ + 

B) expression in strain SEF 1 

signal 14.5 k d species* 12 fed gpeC&eP — 

15 pGEM-MB42 phoA +/- +++ 

Notes: 

a) Analysis using rabbit anti-BPTI polyclonal antibodies 
and horse-radish-peroxidase-conjugated goat anti-rabbit IgG 

20 antibody* 

b) pro-BPTI/VIII peptides encoded by the synthetic gene 
and derived constructs. 

c) processed BPTI/VIII peptide encoded by the synthetic 

gene • 

25 d) not present - 

weakly present +/- 

present . - - + 

strong presence . . • . ++ 
very strong presence +++ 
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1579 
1611 
1651 
5 1691 
1731 
1771 
1811 
1851 

10 1891 
1931 
1971 
2011 
2051 

15 2091 
2131 
2171 
2211 
2251 

20 2291 
2331 
2371 
2411 
2451 

25 2491 
2531 
2571 
2611 
2651 

30 2691 
2731 
2771 
2811 
2851 
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Table 109: 
5»-GT GAAAAAATTA 
TGTTCCTTTC TATTCTCACT 
TGTTTAGCAA AACCCCATAC 
TCTGGAAAGA CGACAAAACT 
TGAGGGTTGT CTGT6GAATG 
ACTGGTGACG AAACTCA6T6 
TTGGGCTT6C TATCCCT6AA 
GGGTG6CG6T TCTGAGGGTG 
ACTAAACCTC CTGAGTACGG 
ATACTTATAT CAACCCTCTC 
TACTGAGCAA AACCCCGCTA 
GAGTCTCAGC CTCTTAATAC 
GGTTCCGAAA TAGGCAGGGG 
CACTGTTACT CAAGGCACTG 
CAGTACACTC CTGTATCATC 
ACTGGAACGG TAAATTCAGA 
CTTTAATGAG GATCCATTCG 
TCGTCTGACC TGCCTCAACC 
GCTCTGGTGG TGGTTCTGGT 
CTCTGAGGGT GGCGGTTCTG 
GGCGGTTCCG GTGGTGGCTC 
ATGAAAAGAT GGCAAACGCT 
AAATGCCGAT GAAAACGCGC 
AAACTTGATT CTGTCGCTAC 
ATGGTTTCAT TGGTGACGTT 
TGGTGCTACT GGTGATTTTG 
6CTCAAGTCG GTGACGGTGA 
ATTTCCGTCA ATATTTACCT 
ATGTCGCCCT TTTGTCTTTA 
TTTTCTATTG ATTGTGACAA 
TCTTTGCGTT TCTTTTATAT 
ATTTTCTACG TTTGCTAACA 
TAATCATGCC AGTTCTTTTG 



M13 gene III 
TTATTCGCAA TTCCTTTAGT 
CCGCTGAAAC TGTTGAAAGT 
AGAAAATTCA TTTACTAACG 
TTAGATCGTT ACGCTAACTA 
CTACAGGCGT TGTAGTTTGT 
TTACGGTACA TGGGTTCCTA 
AATGAGGGTG GTGGCTCTGA 
GCGGTTCTGA GGGTGGCGGT 
TGATACACCT ATTCCGGGCT 
GACGGCACTT ATCCGCCTGG 
ATCCTAATCC TTCTCTTGAG 
TTTCATGTTT CAGAATAATA 
GCATTAACTG TTTATACGGG 
ACCCCGTTAA AACTTATTAC 
AAAAGCCATG TATGACGCTT 
GACTGCGCTT TCCATTCTGG 
TTTGTGAATA TCAAGGCCAA 
TCCTGTCAAT GCTGGCGGCG 
GGCGGCTCTG AGGGTGGTGG 
AGGGTGGCGG CTCTGAGGGA 
TGGTTCCGGT GATTTTGATT 
AATAAGGGGG CTATGACCGA 
TACAGTCTGA CGCTAAAGGC 
TGATTACGGT GCTGCTATCG 
TCCGGCCTTG CTAATGGTAA 
CTGGCTCTAA TTCCCAAATG 
TAATTCACCT TTAATGAATA 
TCCCTCCCTC AATCGGTTGA 
GCGCTGGTAA ACCATATGAA 
AATAAACTTA TTCCGTGGTG 
GTTGCCACCT TTATGTATGT 
TACTGCGTAA TAAGGAGTCT 
GGTATTCCGT 
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Table 110: Introduction of Ha£l into gene IH 

A) Wild- type HI, portion encoding the signal peptide 

5 MKKLL FA I PL 

12 3456789 10 
1579 5 ' -GTG AAA AAA TTA TTA TTC GCA ATT CCT TTA 

10 / Cleavage site 

vl 

VVPFYSHS AET V 
11 12 13 14 15 16 17 18 19 20 21 22 
1609 GTT GTT CCT TTC TAT TCT CAC TCC GCT GAA ACT GTT-3 * 

15 

B) in . portion encoding the signal peptide with ,Bufl. site 

20 m k k 1 1 f a I P 1 

.1 2 3 4 5 6 7 8 . ? 10 
1579 5«-gtg aaa aaa tta tta ttc gca att cct tta 

9 e / cleavage site 

2 J 

vvpfysGAaetv 

11 12 13 14 15 16 17 18 19 20 21 22 
1609 gtt gtt cct ttc tat tct GGc Gcc get gaa act gtt-3* 

30 
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Table 113 : Annotated Sequence of 
pGEM-MB42 comprising Ptac : : RBS (GGAGGAAATAAA) : : 
phoA- signal : : mature- bp : mature- VTTT- coat -protein 



5 ' -GGATCC actccccatcccc 
J. 



BamH I 



10 



15 



ctg TTGACA attaatcatcgGCTCG tataat GT6TGG- 
-35 _____ -10 



aA TTG TGAGCG cT c ACAATT - 
lacO-symm operator 





M 


K 


Q 


s : 


T 


20 


1 


2 


3 


4 


5 


GAGCTCCATGGGAGAAAATAAA 


ATG 


AAA 


CAA 


AGO 


ACQ 



I Sad I 



< - « — 



phoA signal peptide 



25 


I ■ 


A 


L 


L 


P 


L 


It 


P 


T 


■ P 


V 


T 




6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


17 




ATC 


GCA 


CTC 


TTA 


CCG 


TTA 


CTG 


•1«1T 


ACC 


CCT 


GTG 


ACA 



phoA signal continues 



30 



(There are no residues 20-23.) 



K A R 
18 19 24 
35 AAA GCC CGT 

phoA signal - > | 
phoA/BPTI Jnct 

l< 



p 


D 


F 


25 


26 


27 


CCG 


GAT 


TTC 




ml 





c 

28 
TGT 



BPTI insert 



LIE 
29 1 30 
CTC I GAG 
I Ava 1 
I KIP J 1 



1 



40 



45 



J 111 32l.I3.Ll4L 



G I P C 

__ 35| 36| 37 

|CCA| , TAC|ACA , |OOa|txX:|TO<;|AAA|000|CaC|_ 

J E£1M_I ; L 



K 
38 



A 

39 



R 
40 



Pra II 

PBS I 



I BbsH XT 
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Table 113 : Annotated Sequence of 
Ptac: :RBS (GGAGGAAATAAA) : : 
phoA-eicnalt : mature -bpt it tmature-VIII-coat -protein gene 

(continued) 

5 



z 


I 


R 


Y 


P 


Y 


N 


A 


K 


A 


41 


42 


43 


44 


45 


46 


47 


48 


49 


50 


ATC 


ATC 


CGC 


TAT 


TTC 


TAC 


AAT 


GCT 


AAA 


GC 



10 





G 


I» 


C 


Q 


T 


F 


V 


Y 


G 


G 




51 


52 


53 


54 


55 


56 


57 


58 


59 


60 


A 


GGC 


CTG 


TGC 


CAG 


ACC 


TT1' 


GTA 


TAC 


GGT 


GGT 


1 


StV 


-XI 










ACC X 




















Xca I 









C 


R 


A 


K 


R 


N 


N 


p 


K 






61 


62 


63 


64 


65 


66 


67 


68 


69 




20 


TGC 


CGT 


GCT 


AAG 


CGT 


AAC 


AAC 


TTT 


AAA 










Est) I 


a 














S 


A 


E 


D 


c 


M 


R 


T 


C 


G 




70 


71 


72 


73 


74 


75 


76 


77 


78 


79 


25 


TCG 


GCC 


GAA 


GAT 


TGC 


ATG 


CGT 


ACC 


TGC 


GGT 




fXitiall] 


a 




1 spb i 


a 









BPTI insert- 



30 





G 


A 




80 


81 




GGC 


GCC 




Pbe I 


35 


Nar T 


-- BPTI--> 



BPTI/M13 boundary 
▼.I 

A 
82 
GCT 



E 


G 


D 


D 


P 


A 


K 


A 


A 


83 


84 


85 


86 


87 


88 


89 


90 


91 


GAA 


GGT 


GAT 


GAT 


CCG 
1 


GCC 

Si 


AAG 

?A I 


GCG 


GCC 



mature gene VIII coat protein 





P 


N 


S 


L 


Q 


A 


S 


40 


92 


93 


94 


95 


96 


97 


98 




TTC 


AAT 


TCT 


CTG 


CAA 


GCT 


TCT 












xm 


Ipd ■ 


u. 


45 


E 


Y 


I 


G 


Y 


A 


w 




101 


102 


103 


104 


105 


106 


107 




GAG 


TAT 


ATT 


GGT 


TAC 


GCG 


TGG 



A 


T 


99 


100 


GCT 


ACC 
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Table 113 : Annotated Sequence of 
Ptac: :RBS (GGAGGAAATAAA) : : 
phoA- aicmal; : mature- bp t i ; ;mafcure-VTTT- coat -protein gene 

(continued) 





A 


M 


V 


V 


V 


I 


V 


G 


A 


• 


10B 


109 


110 


111 


112 


113 


114 


115 


116 




6CC 


ATG 


GTG 


GTG 


GTT 


ATC 


GTT 


GGT 


GCT 



10 



BstX I 



Nco 



1 





T 


I 


G 


I 


15 


117 


118 


119 


120 




ACC 


ATC 


GGG 


ATC 





K 


L 


F 


K 


K 


F 


T 


' S 


K 


A 


20 


121 


122 


123 


124 


125 


126 


127 


128 


129 


130 




AAA 


CTG 


TTC 


AAG 


AAG 


TTT 


ACT 


TCG 


AAG 


GCG 
















... lAsu=3 


Hi 






S 




















25 


131 


132 


133 


134 
















TCT 


TAA 


TGA 


TAG 


GGTTACC- 









Bat E II 



30 AGTCTA AGCCCGC CTAATGA GCGGGCT TTTTTTTT- 
terminator : 



aTCGA GACctgca GGTCGAC-3 1 

35 {sail | 
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Table 114: Neutralization of Phage Titer Using 
Agarose- immobilized Anhydro- Trypsin 

Percent Residual Titer 

As a Function of Time (hours) 



15 



Phacre Type — 


Addition 


1 


2 


4 


MK-BPTI 


5 


Ml 


IS 


99 


104 


105 




2 


Ml 


IAT 


82 


71 


51 




5 


A* 


IAT 


57 


40 


27 




10 


Ml 


IAT 


40 


30 


24 


MK 


5 


Ml 


IS 


106 


96 


98 




2 


Ml 


IAT 


'97 


103 


95 




5 


Ml 


IAT 


110 


111 


96 




10 


Ml 


IAT 


99 


93 


106 



Legend: 

IS *= Immobilized streptavidin 
IAT « Immobilized anhydro- trypsin 
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Table 115: Affinity Selection of MK-BPTI Phage 
on immobilized Anhydro- Trypsin 

Percent of Total Phage 

?rif nf ty»* Addition yprpvfired in ffhirlon Butter 

MK-BPTI 



10 

MK 



15 

Legend: 

20 



5 pi 


IS 


«1" 


2 pi 


IAT 


5 


5 pi 


IAT 


20 


10 pi 


IAT 


50 


5 Ml 


IS 


«1» 


2 ftl 


IAT 


«1 


.5 pi 


IAT 


«1 


10 ill 


IAT 


«1 



IS - Immobilized streptavidin 
IAT - immobilized anhydro- trypsin 
* not detectable. 
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Table 116: translation of sianal-lll; ibpti; zmwture-III 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
£M K K It Ii F A I P L V V P F Y 
GTG AAA AAA TTA TTA TTC GCA ATT CCT TTA GTT GTT CCT TTC TAT 
|< gene III signal peptide 



16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 
SGARPDFCL EPPYTG 
TCT GGC GCC cat cca aat ttc tat etc aaa eca cca tac act qqq 
>|< BPTI insertion 

31 32 '33 34 35 36 37 38 39 40 41 42 43 44 45 
p C K A R I I R Y PY N AKA 



ccc 


toe 


aaa 


qcq 


cqfi. 


atg 


at<? 


cqc 


tat. 


ttQ 


tac_ 


aat 




a?ta qea 


46 


47 


48 


49 


50 


51 


52 


53 


54 


55 


56 


57 


58 


59 60 


G 


L 


C 


Q 


T 


F 


V 


Y 


G 


G 


C 


R 


A 


K R 


qqc. 


ctq 


tqc_ 


eaq 


a££_ 


ttt 


qt?L 


tac_ 


gqt 


.ss£- 


tqs_ 




gct_ 


aaa cat 


61 


62 


63 


64 


65 


66 


67 


68 


69 


70 


71 


72 


73 


74 75 


N 

aacL 


N 


F 


K 


S 


A 


E 


D 


c 


M 


R 


T 


C 


G G 


a^CL 


ttt 


aaa 


tcq 


crcc 


qaa,. 


gat 


tqc 


ata 


cqiL 




tqc 


aat aac 



76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 
AGAAETV E S CLAKPH 
acc GGC GCC GCT GAA ACT GTT GAA AGT TGT TTA GCA AAA CCC CAT 
| < mature gene III protein 

91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 
TENSFTNVWKDDKT L 
ACA GAA AAT TCA TTT ACT AAC GTC TGG AAA GAC GAC AAA ACT TTA 
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Table 116: translation of yHcmal-lllt sbntitt mature- III 
(continued) 

106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 

DRYANYEGCL W N A T G 
GAT CGT TAC GCT AAC TAT GAG GGT TGT CTG TGG AAT GCT ACA GGC 

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 

V V V C T G D E T Q C Y G T W 
GTT GTA GTT TGT ACT GGT GAC GAA ACT GAG TGT TAC GGT ACA TGG 

136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 

V P I G L A I P E N E G G G S 
GTT CCT ATT GGG CTT GCT ATC CCT GAA AAT GAG GGT GGT GGC TCT 

151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 

E G G G S E G GGSEGGGT 
GAG GGT GGC GGT TCT GAG GGT GGC GGT TCT GAG GGT GGC GGT ACT 

166 167 168 169 170. 171 172 173 174 175 176 177 178 179 180 

K P P E Y G D T P I P G Y T Y 
AAA CCT CCT GAG TAC GGT GAT ACA CCT ATT CCG GGC TAT ACT TAT 

181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 

I N P L D G T Y P P G T E Q N 
ATC AAC CCT CTC GAC GGC ACT TAT CCG CCT GGT ACT GAG CAA AAC 

196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 

P A N P N P S L E E S Q P L N 
CCC GCT AAT CCT AAT CCT TCT CTT GAG GAG TCT CAG CCT CTT AAT 

211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 

T P M F Q NNR FR NR Q GA 
ACT TTC ATG TTT CAG AAT AAT AGG TTC CGA AAT AGG CAG GGG GCA 
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Table 116: translation of s -Ions! -Ills tbpfcit : mature -III 
(continued) 

226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 

LT V YTGTVTQGTD P V 
TTA ACT GTT TAT ACG GGC ACT GTT ACT CAA GGC ACT GAC CCC GTT 

241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 

KTYY QYTPVSS KAMY 
AAA ACT TAT TAG CAG TAC ACT CCT GTA TCA TCA AAA GCC ATG TAT 

256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 
DA YWNGKFRDCAFHS 

GAC GCT TAC TGG AAC GGT AAA TTC AGA GAC TGC GCT TTC CAT TCT 

271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 
- : - GFNEDPFVCEYQ GQS 
GGC TTT AAT GAG GAT CCA TTC GTT TGf GAA TAT CAA GGC CAA TOG 

- 286 287 288 289 290 291 292 293294 295 296 297 298 .299, 300 
S D L P Q P P V N A G G G S G 
TCT GAC CTG CCT CAA CCT CCT GTC AAT GCT GGC GGC GGC TCT GGT 

301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 

G GS GGGS EGGGS EG G 
GGT GGT TCT GGT GGC GGC TCT GAG GGT GGT GGC TCT GAG GGT GGC 

316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 

G S E G G G S E G G G S G G G 
GGT TCT GAG GGT GGC GGC TCT GAG GGA GGC GGT TCC GGT GGT GGC 

331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 
SGSGDFDYEKMANAN 

TCT GGT TCC GGT GAT TTT GAT TAT GAA AAG ATG GGA AAC GCT AAT 



WO 92/15679 



PCT/US92/01539 



127 

Table 116: translation of sianal-TITi :bpti ;: mature- III 
(continued) 

346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 

K G A M T E N A D E N A L Q S 
AAG GGG GCT ATG ACC GAA AAT GCC GAT GAA AAC GCG CTA CAG TCT 

361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 

D A K G K L D S V A T D Y G A 
GAC GCT AAA GGC AAA CTT GAT TCT GTC GCT ACT GAT TAG GGT GCT 

376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 

AIDGPIGDVSGL AN G 
GCT ATC GAT GGT TTC ATT GGT GAC GTT TCC GGC CTT GCT AAT GGT 

391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 

N G A T G D F A G S N S Q M A 
AAT GGT GCT ACT GGT GAT TTT GCT GGC TCT AAT TCC CAA ATG GCT 

406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 

q v'-'"'G' : "'-"~ D "G ■" D -W-'j : S"" ::; -P ^ " L ' M ' •••N : "'S'N- ' 
CAA GTC GGT GAC GGT GAT AAT TCA CCT TTA ATG AAT AAT TTC CGT 

421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 

Q y L P S L P Q S V E C R P F 
CAA TAT TTA CCT TCC CTC CCT CAA TCG GTT GAA TGT GGC CCT TTT 

436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 

V F S A G K P Y E F S I D C D 
GTC TTT AGC GCT GGT AAA CCA TAT GAA TTT TCT ATT GAT TGT GAC 
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Table 116: fiyni-ni : tbpti; t mature-III 

(continued) 

451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 

K I N L F R G V F A F L L Y V 
AAA ATA AAC TTA TTC CGT ™1T GTC TTT GCG TTT CTT TTA TAT GTT 

| <- uncharged anchor region 

466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 

A T F MY V F S T F A N I L R 
arr. acc TTT atg tat cta ttt TCT ACG TTT GCT AAC ATA CTS CGT 
uncharged anchor region continues > \ 

481 482 483 484 485 

N K E S 
AAT AAG GAG TCT TAA 



Molecular weight of peptide - 58884 

Charge on peptide - -20 

[A+G+P] - 14 2 

[C+F+H+I+L+M+V+W+Y] - 140 

[D+B+K+R+N+Q+S+T+.] - 202 
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Table 116: translation of Signal-TIT ; sbptli ; mature- III 
(continued) 

Second Base 





t . 


C 


a 


g 












15 


21 


15 


8 


t 










12 


5 


10 


6 


C 










10 


4 


0 


0 


a 










0 


3 


0 


4 


q 








1 


6 


20 


2 


8 


t ' 










3 


4 


0 


3 


c 










1 


4 


9 


1 


a 










4 


3 


7 


0 


g 










5 


19 


21 


1 


t 










5 


4 


11 


1 


C r : 










2 


4 — 


16 


1 


a 










8 


- 2 


.... 4 


2 


g 










13 


: ' ; -22 v ' ; 


i"4 - 


41 


t ;;; 










6 - 


7 


12 


29 


c 










4 


5 


12 


1 


. a 










1 " 


3 


16 


4 


g ' 








AA 


# 


AA 


# 


AA 


# 


AA 


# 


A 


37 


C 


14 




D 


26 


E 


28 


P 


27 


G 


75 




H " 


2 


I 


12 


K 


20 


L 


24 




M 


9 


N 


32 


P 


31 


Q 


16 




R 


15 


S 


35 


T 


29 


V 


23 




W 


4 


Y 


25 



1 
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Table 130: Sampling of a Library encoded by (NNK) 6 
A. Numbers of hexapeptides in each class 

» 64,000,000 stop- free sequences. 



one of [WMFYCIKDENHQ] 
one of [PTAVG] 
one of [SLR] 



total. 

a can be 
* can be 
Q can be 

OLOLOLOLOLOL 
QOLQLOtOOL 
QQOiOLCHX 

&QQ0LQL0L 

GQQQaa 

QQQQQOf 

4><fc*00Q 
«QQQQQ 

SMQaa, for example, stands for the set of peptides having 
two amino acids from the a class, two from *, and two from 
Q arranged in any order. There are, for example, 729 = 3 
sequences composed entirely of S, L, and R. 





2985984. 


♦aaaofof 




7464960. 




4478976 . 






7776000. 




9331200. 


OQaaaa 




2799360. 




4320000. 


**0aaa 




7776000. 




4665600. 


QQQoKxa 




933120. 




1350000. 




as 


3240000. 




2916000. 


*riOQaar 




1166400. 




174960 . 






225000. 




675000. 






810000. 




486000. 


4»QD0Q0f 


■3 


145800. 




17496. 






15625. 




56250. 






84375 . 




67500. 


44>QQQQ 




30375 i 




7290. 


QQQQQQ 




729. 
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Tatole 130: Sampling of a Library encoded by (NNK) 6 

(continued) 

B. Probability that any given stop- free DNA sequence will 
5 encode a hexapeptide from a stated class . 







P 


% of class 




ctototototot . . . 


3.364E-03 


(1.13E-07) 




Qototototat* • . 


1.682E-02 


(2.25E-07) 


10 


Qaaaaa . . . 


1.514E-02 


(3.38E-07) 




ttototaot . . . 


3.505E-02 


(4.51E-07) 




4>Qaaaa« . . 


6.308E-02 


(6.76E-07) 




QQofQraa. . . 


2.839E-02 


(1.01E-06) 




<M«l>ofo;o;. . • 


3.894E-02 


(9.01E-07) 


15 


$$Qaact. • « 


1.051E-01 


(1.35E-06) 




4QQctaKX. • ♦ 


9.463E-02 


(2.03E-06) 




QQQocccot. • . 


2.839E-02 


(3.04E-06) 




MMaa ... 


2.434E-02 


(1.80E-06) 




***Qcra . . . 


8.762E-02 


(2.70E-06) 


20 




1.183E-01 


(4.06E-06) 




4>QQQact. . . 


7.097E-02 


(6.08E-06) 




QQQQttCf. . . 


1.597E-02 


(9.13E-06) 






8.113E-03 


(3.61E-06) 




MMOa. . . 


3.651E-02 


(5.41E-06) 


25 


***QQtt. . . 


6.571E-02 


(8.11E-06) 




**00Qa. . . 


5.914E-02 


(1.22E-05) 




*nQ0Qa. . . 


2.661E-02 


(1.83E-05) 




QQftQQa... . 


4.790E-03 


(2.74E-05) 




. . . 


1.127E-03 


(7.21E-06) 


30 


*****Q. . . 


6.084E-03 


(1.08E-05) 




*4>*«Qfi . . . 


1.369E-02 


(1.62E-05) 




***QOO 


1.643E-02 


(2.43E-05) 




**QQQ0 


1.109E-02 


(3.65E-05) 




♦Q0QQQ. . . 


3.992E-03 


(5.48E-05) 


35 


QQQQQQ. . . 


5.988E-04 


(8.21E-05) 
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Table 130: Sampling of a Library encoded by (NNK) 6 

(continued) 

Number of different stop- free amino-acid sequences in 
each class expected for various library sizes 



Library size - 1.0000E+06 
total - 9.7446E+05 % sampled 



1.52 



Class 


Number 






Class 


Number 






aaaaoca. . . 


3362. 6( 




.1) 


4aaaaa. . . 


16803.4 ( 




.2) 


Qceaaace. . . 


15114.6 ( 




.3) 


44aaaa. . . 


34967.8 ( 




.4) 


4Qaaaa. - . 


62871. 1( 




.7) 


OQaaaa. . . 


28244.3 ( 


1 


.0) 


&&&aaa. . . 


38765.7 ( 




.9) 


«*Qaaa. . . 


104432.2 ( 


1 


.3) 


4QQctaa. . . 


93672.7 ( 


2 


.0) 


QQQaaa. . . 


27960.3 ( 


3 


.0) 


4444aar. . . 


-24119.9 ( 


1 


.8) 


4>MQaa. . . 


86442 . 5 ( 


2 


.7) 


44QQoa. . . 


115915.5 ( 


4 


.0) 


4QQQaa. . . 


68853.5 ( 


5 


.9) 




r 15261.1( 


* 8 


.7) 


44444a. . . 


7968.1 ( 


•"3- 


.5) 


4444QO!. . . 


-35537.2 ( 


.5 


.3) 


44$QQa. . . 


63117.5 { 


7 


.8) 


44QQQa. . . 


'55684.4 ( 


11 


.5) 


4QQQQa. . . 


24325.9 ( 


16 


,7) 


QQQQQtt. . . 


4190.6 ( 


24 


,0) 


444444... 


i087.lt 


7 


.0) 


44444Q 


5767.0 ( 


10 


.3) 


4444QQ 


12637.2 ( 


15 


.0) 


444QQQ. . . 


14581.7 ( 


21 


.6) 


44QQQQ. . . 


9290.2 ( 


30 


.6) 


4QQQQQ . . . 


3073.9 ( 


42 


.2) 


QQQQQQ... 


408.4 ( 


56 


.0) 



Library size « 3.0000B+06 
total « 2 . 78 85E+06 % sampled 



4.36 



aaaaacn. . . 


10076 


.4( 




.3) 


4aaaaa. . . 


50296.9 ( 




.7) 


Qceaacea. . . 


45190 


.9 ( 


1 


.0) 


44aaaa. . . 


104432.2 ( 


1 


.3) 


4Qaaaa. . . 


187345 


-5( 


2 


.0) 


QQOfQfQfa. . . 


83880.9 ( 


3 


.0) 


444aaa. . . 


115256 


.6( 


2 


.7) 


44Qaaa! . . . 


309107.9 ( 


4 


.0) 


$QQotaa. . - 


275413 


.9( 


5 


.9) 


QQQaaa. . . 


81392.5 ( 


8 


.7) 


****aa. . . 


71074 


.5( 


5 


.3) 


***Qaa. . . 


252470.2 ( 


7 


.8) 


**nnofQ£ — 


334106 


.2( 


11 


.5) 


4QQQaa. . . 


194606.9 ( 


16 


.7) 


GQQQaa. . . 


41905 


.9( 


24 


.0) 


4444*0!. . . 


23067.8 ( 


10 


.3) 


MMQa. . . 


101097 


.3 ( 


15 


.0) 


444QQa . . . 


174981.0 ( 


21 


.6) 


44QQQa. . . 


148643 


.7 ( 


30 


.6) 


4QQQQa. . . 


61478.9 ( 


42 


.2) 


QQQQQa. . . 


9801. 0( 


56 


.0) 


444444). . . 


3039.6 ( 


19 


.5) 


44444Q. . . 


15587 


.7( 


27 


.7) 


4444QQ. . . 


32516.8 ( 


38 


.5) 


444QQQ . . . 


34975 


-6( 


51 


.8) 


44QQ0Q. . . 


20215.5 ( 


66 


.6) 


SQQQQQ 


5879 


.9 ( 


80 


.7) 


QQQDQQ. . . 


667.0 ( 


91 


.5) 
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Table 130: Sampling of a Library encoded by (NNK) S 

(continued) 



10 



15 



20 



25 



30 



35 



40 



Library size - 1.0000E+07 
total - 8.1204E+06 % sampled - 



12.69 





• 


33455. 


9( 


1.1) 


Oaaaaa. . 


• 


166342.4 ( 


2.2) 


Qotototocct. . 


• 


148871. 


1( 


3.3) 


**araaa. . 


• 


342685.7 ( 


4.4) 


QQotaaa . . 


• 


609987. 


6( 


6.5) 


QQaaaa. . 


• 


269958.3 ( 


9.6) 




• 


372371. 


8( 


8.6) 


**0aaa. . 


• 


983416.4 ( 


12.6) 


4>QQotc?a . . 


• 


856471. 


6 ( 


18.4) 


OOOaaa. . 


• 


244761.5 ( 


26.2) 




• 


222702. 


0( 


16.5) 


***Qaa. . 


• 


767692.5 ( 


23.7) 


4>4>QQota . • 


* 


972324. 


6( 


33.3) 


♦QQQaa. . 


• 


531651.3 ( 


45.6) 


QQQQact. • 


• 


104722. 


3 ( 


59.9) 


*****a. . 


• 


68111.0 ( 


30.3) 




• 


281976. 


3 ( 


41.8) 


***QQa. . 




450120.2 ( 


55.6) 


<M>QQGaf . . 




342072. 


1( 


70.4) 


4>QQQQa. . 


• 


122302.6 ( 


83.9) 


QQGQQa* . 


• 


16364. 


0( 


93.5) 


. 


• 


8028.0 ( 


51.4) 


$WtQ . • 


• 


37179. 


9 ( 


66.1) 


<M>**QQ . . 


* 


67719.5 f 


80.3) 






61580. 


0( 


91.2) 


**0Q00 . . 




29586. 1L 


97.4) 


*bOQQQ. . 




7259. 


5 ( 


99.6) 


QQQQQQ. . 


• 


728.8(100.0) 


Library 


size - 


3 


. 0OO0B+07 











total 

aaactcta. 
Qaaraaa. 
♦naaaa. 

*OQaaa. 

noonaa. 
*<t>»4>Qa. 
$4>QQQa. 
Qoonoa. 

*QDQQQ. 



1.8633E+07 % sampled - 29.11 



99247 
431933 
1712943 
1023590 
2126605 
563952 
2052433 
163640 
541755 
473377 
17491 
54058 
67454 
7290 



4( 

3( 
0( 
0( 



.0 
.6 
.0 
.3 
.7 
.0 
.3 
.1 ( 
.5( 



3.3) 
9.6) 
18.4) 
23.7) 
45.6) 
41.8) 
70.4) 
93.5) 
80.3) 
97.4) 
(100.0) 
96.1) 
99.9) 



.0(100.0) 



QOtOtCtOtCt. . . 

ttototaiai . . . 
QQofaofa. . . 
$4>Qa<xa. . . 
QQQacxa. . , 
#**Qaa. . . 
*QQQaa. . . 
<M>***a. . 
***OQa. . , 
*QQQQa. . 
HHM. . 

****ao. . 

**Q0QQ . . 
QQOQQQ . . 



487990. 
983416. 
734284. 
2592866. 

558519. 
1800481. 
978420. 
148719. 
738960. 
145189. 
13829. 
83726 
30374 
729 



0( 
5( 
6( 
0( 
0( 
.0 ( 
.5( 
• 7( 
.1 ( 
.7( 
.1 ( 
.0( 



6.5) 
12.6) 
26.2) 
33.3) 
59.9) 
55.6) 
83.9) 
66.1) 
91.2) 
99.6) 
88.5) 
99.2) 
.5(100.0) 
.0 (100.0) 
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Table 130: Sampling of a Library encoded by (NNK)* 

(continued) 



10 



15 



20 



25 



30 



35 



40 



Library 

total - 

aaaaofot . . 
Qaaaaea. . 
4Qaaaa. . 
*4*aaa. . 
4QQaaa. . 
4444aa. . 
44QQaa. . 
nnoDaa. . 
4444Qa. . 
44QQQ0T. . 
QQQQQa. . 
44444Q. . 
444QQQ* . 
4QQQQQ. . 



size 



7.6000E+07 



3 . 2125E+07 % sampled - 50 . 19 



245057 
1014733 
3749112 
2142478 
3666785 
1007002 
2782358 
174790 
663929 
485953 
17496 
56234 
67500 
7290 



size 



.8( 8.2) 
.0( 22.7) 
.0( 40.2) 
.0( 49.6) 
.0( 78.6) 
.0( 74.6) 
.0( 95.4) 
.0( 99.9) 
.3( 98.4) 
.2(100.0) 
.0(100.0) 
:9 (100.0) 
.0(100.0) 
.0(100.0) 

1.0000E+08 



*aotaaa. 
44aacea. 
QQaaaof. 
44Qaaa. 
QQQaaar. 
444Qaa. 
4QQQaa. 

44444<X. 

444QQa. 
4QQQQa. 
444444. 
4444QQ. 
44QQQQ. 
QQQQQQ. 



1175010 
2255280 
1504128 
4993247 
840691 
2825063 
1154956 
210475 
808298 
145799 
15559 
84374 
30375 
729 



15.7) 
29.0) 
53.7) 
64.2} 
90.1) 
87.2) 
99.0) 
93.5} 
99.8) 
9(100;0) 
9( 99.6) 
6(100.0) 
0(100.0) 
0(100.0} 



0( 
0( 
0( 
0( 
9( 
0( 
0( 
6( 
6( 



Library 

total = 3.6537E+07 % sampled 



57.09 



aaotaaa. . 
Qaaaaa... 
*Qoraaa. . 
&$>&aaa . . 
4QQaa«x. . 
4444aa. . 
44QQ0KZ. . 
QQQQaa. . 
4444Qa. . 
44QQQa. . 
QQQQQa. . 
MWQ. . 

***nan. . 

4QQQQQ. . 



318185 
1284677 
4585163 
2566085 
4051713 
1127473 
2865517 
174941 
671976 
485997 
17496 
56248 
67500 
7290 



10.7) 
28.7) 
49.1) 
59.4} 
86.8) 
83.5} 
98.3) 
.0(100.0) 
.9( 99.6) 
.5(100.0) 
.0(100.0) 
.9(100.0) 
.0(100.0) 
.0(100.0) 



.K 
,0( 
.0( 
• 0( 
.0( 
.0( 
.0( 



4aaaaa. . 
**aaaa . . 
QQaaaea. . 
44Qaaa. . 
QQQaaa. . 
444Qaa. . 
4QQQokx. . 

44444Qf. . 

444QQa. . 
4QQQQa. . 

4444QQ. . 
44QQQQ. . 
QQQQQQ. . 



1506161 
2821285 
1783932 
5764391 
888584 
3023170 
1163743 
218886 
809757 
145800 
15613 
84375 
30375 
729 



20.2) 
36.3) 
63.7) 
74.1) 
95.2) 
93.3) 
99.8) 
97.3) 
3(100.0) 
.0(100.0) 
.5( 99.9) 
.0(100.0) 
.0(100.0) 
.0(100.0) 



0( 
0( 
0( 
0( 
,3( 
0( 
.0( 
.6( 
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15 



20 



Table 130: Sampling of a Library encoded by (NNK) * 

(continued) 

Library size ■ 3.0000E+08 

total - 5.2634E+07 % Bampled - 82.24 



aaaaaa. . . 


856451. 


3( 28 


.7) 


*aaaaa. . . 


3668130. 


0( 49 


.1) 


naaaaa. . . 


2854291. 


0( 63 


.7) 


**aaaa. . . 


5764391. 


0( 74 


.1) 


*Qaaaa. . . 


8103426.0 ( 86 


.8) 


QQaaaa. . . 


2665753. 


0( 95 


.2) 


***aaa. . . 


4030893. 


0( 93 


.3) 


**Qaaa. . . 


7641378. 


0( 98 


.3) 


*QQaaa. . . 


4654972. 


0( 99 


.8) 


QQQaaa. . . 


933018. 


6(100 


.0) 


****aa. . . 


1343954. 


0( 99 


.6) 


***Qaa . . . 


3239029. 


0(100 


.0) 


**QQaa. . . 


2915985. 


0(100 


.0) 


*QQQaa. . . 


1166400. 


0(100 


.0) 


QQQQaa. . . 


174960. 


0 (100 


.0) 


*****a ... 


224995. 


5 (100 


.0) 


****Qa . . . 


674999. 


9 (100 


.0) 


***QQa. . . 


810000. 


0 (100 


.0) 


**QQQa. . . 


486000. 


ocioo 


.0) - 




145800. 


0 (100 


.0) 


QQQQQtt. . . 


17496.0(100 


.0) 


. . 


15625. 


0(100 


.0). 


*****Q . . . 


56250. 


0(100 


.0) 


4>4>*4>QQ. . . 


84375. 


0(100 


.0) 


***QQQ. .r., 


. ; 67500 . 


0(100 


.0) 


- ,^-**QQQQ 


30375. 


0(100 


.0) 


*QQQQQ. . . 


7290. 


0 (100 


.0) 


QOQQQQ. 


729. 


0(100 


.0) 



25 



30 



35 



40 



Library size - 1.0000E+09 

6.1999E+07 % sampled - 96.87 



total 

aaaaaa . 
Qaaaaa. 
♦Qaaaa . 

»nQaaa. 

»«OQ0fQf. 

OOQOaa. 
»<M>*Qa. 
**DDQa. 
QOQOQOf. 

4>QQQQQ. 



. 2018278. 
.4326519. 
. 9320389. 
. 4319475. 
. 4665600. 
. 1350000. 
. 2916000. 
. 174960. 
. 675000. 
. 486000. 

17496. 

56250. 

67500. 
7290. 



0( 67.6) 
0( 96.6} 
0( 99.9) 
0(100.0) 
0(100.0) 
0 (100.0) 
0 (100.0) 
0 (100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0 (100.0) 
0 (100.0) 
0 (100.0) 



*aofaao!. . , 
4>4>aaaa . . 
OQaaaa. . , 
**0aaa. . 
QQQaaa. . 
4>4>4>Qaa. . 
4>QQQaa. . 
4><i»«4>4>a. . 
<X"X»I>QQa . . 
*QQQQa. . 

****** . . 

****QQ. . 
**QQQQ . . 
QQQQQQ. . 



6680917 
7690221 
2799250 
7775990 
933120 
3240000 
1166400 
225000 
810000 
145800 
15625 
84375 
30375 
729 



.0( 89.5) 
.0( 98.9) 
.0 (100.0) 
.0(100.0) 
.0 (100.0) 
.0 (100.0) 
.0 (100.0) 
.0 (100.6) 
.0 (100.0) 
.0(100.0) 
.0 (100.0) 
.0(100.0) 
.0 (100.0) 
.0 (100. 0) 
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Table 130: 



Sampling of a Library encoded by (NNK) 6 
(continued) 



10 



15 



20 



Library size - 



3.0000E+09 



total « 6.3890E+07 % sampled 



oraaoraa. 
Qaaaaa. 

SOQaora. 
«M>nOaa. 

QQQQOKX. 
4>***OQf . 

QQQQQa. 
$OGQQQ. 



2884346. 
4478800. 
9331200. 
4320000. 
4665600. 
1350000. 
2916000. 
174960. 
675000. 
486000 . 
17496. 
56250. 
. 67500 . 
7290 . 



0( 96.6) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0(100.0) 
0 (100.0) 
0(100.0) 
0 (100.0) 
0 (100.0) 
0 (100.0) 
0(100.0) 
0(10.0.0) 
0(100.0) 



Saaaaa. 
**aoraa . 
QQaaraa. 
**naaa . 
QQQaaa. 
***£Jofa. 
*QnQaa. 

*4>*0Qa. , 
4>DQ0Qa. 

$$4$QQ. , 
«>*QQQQ. 
QQQQQQ. . 



99.83 

. 7456311 
. 7775990 
. 2799360 
. 7776000 
. 933120 
. 3240000 
. 1166400 
. 225000 
. 810000 
145800 
15625 
84375 
. 30375 
729, 



.0( 99.9) 
.0 (100.0) 
.0 (100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
.0(100.0) 
0(100.0) 
0(100.0) 
0 (100.0) 
0 (100.0) 
0(100.0) 
0 (100.0) 
0(100.0) 
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Table 130, continued 

D. Formulae for tabulated quantities. 

Lslze Is the number of Independent transf ormants , 

31**6 Is 31 to sixth power; 6*3 means 6 times 3. 

A » Lsize/ (31**6) 

a can be one of [WMFYCIKDENHQ . ] 

* can be one of [PTAV6] 

Q can be one of [SLR] 

F0 - (12)**6 Fl - (12)**5 F2 - (12)**4 

F3 - (12)**3 F4 - (12)**2 F5 - (12) 

F6 - 1 



15 oraaaaor - F0 * (l-exp(-A)) 

*aoraofa - 6 * 5 * Fl * (1-exp (-2*A) ) 
Qaaacta - 6 *-3 * Fl * (l-exp4~--3*A) ) 
**aaaa - (15) * 5**2 * F2 * (1-exp ( -4*A) ) 
*Qaaaa (6*5)*5*3 *F2 * (1-exp ( -6*A) ) 

20 OOaaaa - (I5).j* 3**2 * F2 * (l-exp (-9*A)) 
***aaa - (20) * (5**3) * F3~"* (l-exp (- 8*A) ) 
**Qaaa « (60) * (5*5*3) *F3* (1-exp (- 12*A) ) 
*QOaaa - (60) * (5*3*3) *F3* (1-exp (-18*A) ) 
OOOaaa - (20)*(3)**3*F3*(l-exp(-27*A)) 

25 ****aa - (15)*(5)**4*F4*(l-exp(-16*A)) 

***Qaa - (60)*(5)**3*3*F4*(l-exp(-24*A)) 
**00aa - (90) * (5*5*3*3) *F4* (l-exp(-36*A) ) 
♦QQOaa - (60)* (5*3*3*3) *F4* (1-exp (-54*A)) 
QQQQaa - (15)*(3)**4 * F4 * (1-exp ( - 81*A) ) 

30 *****a - (6)*(5)**5 * F5 * (1-exp (-32*A)) 
****Qa - 30*5*5*5*5*3*F5*(l-exp(-48*A) ) 
*»*QQa - 60*5*5*5*3*3*F5* (l-exp(-72*A) ) 
**OOQa - 60*5*5*3*3*3*F5* (1-exp ( -108*A) ) 
*QQDQa - 30*5*3*3*3*3*F5* (1-exp (-162*A) ) 

35 OQOnQa - 6*3*3*3*3*3*F5* (1-exp (-243*A) ) 
4"X>4>**« «= 5**6 * (1-exp (- 64*A) ) 
WWQ - 6*3*5**5* (1-exp (-96*A)) 
****QQ - 15*3*3*5**4* (1-exp (-144 *A) ) 
***QQQ - 20*3**3*5**3* (1-exp (-216*A) ) 

40 4>*Q0QQ - 15*3**4*5**2* (1-exp ( -32 4* A) ) 
«QQQQO - 6*3**5*5* (1-exp ( -486*A) ) 
QQQQQQ - 3**6* (1-exp ( -729*A) ) 
total - aaaaaot + *aaaaa! + Qaaaac? + tttaraaa 
QQaacta + ***aaor + **Qaaa + 4>QQaaa 

45 ****aa -i- *4>*Qaa + **0Qaa + »000aa 

Q&bttOi + ***4>Qa •+ <M>*QQa + **OOOa 
QQQOQa + 4>4>*<ME>4 + WMQ + 
**QQQQ + «QQQQQ + QQQQQQ 



*Qaaaa + 
QQQaaa + 
ODQQaa + 
*DQQQa + 
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Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) 2 

X can be F,S, Y, C,L,P,H,R, I,T,N, V,A,D,G 

r can be L a ,R a ,S,W,P,Q,M,T,K,V,A, E,G 

Library comprises 8.55*10* amino -acid sequences; 1.47 «10 7 DNA 
sequences . 

Total number of possible aa sequences- 8,555,625 



15 



x LVPTARGFYCHIND 

S S 

8 VPTAGWQMKES 

Q LR 



20 



25 



30 



35 



40 



The first, second, fifth, and sixth positions can hold 
x or S; 'the;- ; third and fourth position -can hold 8 or Q. I 
have lumped sequences by the number of xs, Ss, 6s, and Qs. 

For example xx8QSS stands for: 

[xxenss , xseoxs, xseosx, sseoxx, sxeoxs, sxoqsx, 
xxsess, xsnexs, xsoesx, ssoexx, sxoexs, sxoesx] 

The following table shows the likelihood that any 
particular DNA. sequence will fall into one of the defined 
classes . 



Library size - 



total . . 
xx68xx. 
xxQQxx. 
xxBQxS . 

xxeess. 

xxQQSS . 
xS8QSS. 

sseess. 

SSQQSS . 



1.0 

1.0000E+00 
3.1524E-01 
4.1684E-02 
1.3101E-01 
3.8600E-02 
5.1042E-03 
2.6736E-03 
1.3129E-04 
1.7361E-05 



Sampling « .00001% 



%sampled. 
xx6Qxx. . . 



xxdexs. 

xxQQxS. 
xxSQSS. 

xseess. 
xsonss . 
sseoss. 



1.1688E-07 
2.2926E-01 
1.8013E-01 
2.3819E-02 
2.8073E-02 
3.6762E-03 
4.8611E-04 
9.5486E-05 
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Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) a 
(continued) 

The following sections show how many sequences of each 
class are expected for libraries of different sizes . 



10 



15 



20 



Library size - 



total 

Type 

xx98xx 

xxQOxx 

xxeoxs ..... 

xxOOSS 

xxQQSS . . v. . 
xSOQSS 

sseess. . . . . 

SSQQSS. . 



1.0000E+05 

9.9137E+04 fraction sampled - 1.1587E-02 

Number % Type Number 

31416.9 ( .7) xxOQxx 22771. 4( 1.3) 

4112. 4( 2.7) xx66xS 17891. 8( 1.3) 

12924. 6( 2.7) xxQOxS 2318. 5( 5.3) 

3808. 1( 2.7) XX0OSS 2732. 5( 5.3) 

^--4^3.7 ( 10.3) xSOeSS 357.8 ( 5.3) 

253. 4( 10.3) xSOOSS 43. 7{ 19.5) 

12. 4( 10.3) SSBQSS..i.. 8.6( 19.5) 

.1.4 ( 35.2) 



Library size 



total . 



25 xxeexx 304783.9 ( 6 

xxQQxx 36508.6 ( 23 

XXBQxS 114741.4 ( 23 

xxOGSS . 33807.7 ( 23 

xxQQSS . . . . . 3114 .6 (66 

30 XS9QSS 1631.5 ( 66 

sseess. . . . . so.K 66 

SSQQSS. . ... 3.9 ( 98 



. - r 1.0000E+06 

9.2064E+05 fraction sampled - 1.0761E-01 

6) xxOQxx 214394.0 ( 12.7) 

,8) xxeexS..... 168452.5 ( 12.7) 



8 ) xxQOxS . . 

8 ) xxOQSS . . 

2 ) xseess . . 

2) xSQQSS.. 

2) SS9QSS . . 
7) 



18383. 8 ( 41.9) 
21666. 6( 41.9) 
2837.3 ( 41.9) 
198. 4( 88.6) 
39. 0( 88.6) 



35 



40 



Library size 



3.0000E+06 



total 2.3880E+06 

xxeexx 855709.5 ( 18. 



fraction sampled - 2.7912E-01 



xxQQxx. 



xxeess . 

xxQQSS. 

xseoss, 
sseess 

SSQQSS 



85564.7 ( 55 



xxOQxS 268917.8 ( 55 



4) 
7) 
7) 



79234.7 ( 55 
4522.6 ( 96 
2369.0 ( 96 
116.3 ( 96 
4.0(100 



xxOQxx 565051.6 ( 33.4} 

xxeexS 443969.1 ( 33.4) 



xxQOxS. 
.7) xxeQSS. 

.1) xseess. 

.1) XSQQSS. 
.1) SS6QSS. 
.0) 



35281.3 ( 80.4) 

41581.5 ( 80.4) 

5445.2 ( 80.4) 

223. 7( 99.9) 

43.9 ( 99.9} 
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Table 131: Sampling of a Library- 
Encoded by (NNT) 4 (NNG) 2 
(continued) 



10 



15 



20 



25 



30 



35 



Library size 



8.5556E+06 



total 4.9303E+06 

XxGGxx 2046301.0 ( 44.0} 

xxQQxx 138575.9 ( 90.2) 

xx6QxS 435524.3 ( 90.2) 

xxeeSS 128324.1 ( 90.2) 

xxQQSS 4703.6(100.0) 

XS6CSS 2463.8(100.0} 

sseess 121.0(100.0} 

SSQQSS 4.0(100.0) 



fraction sampled - 5.7626E-01 



xxOQxx. .... 1160645.0 (68.7) 
xxBOxS 911935.6 ( 68.7) 



xxQQxS, 
xxGQSS . 

xseess , 

xSQQSS. 

sseoss , 



99.0) 
99.0) 
99.0) 
224.0(100.0) 
44.0 (100.0) 



43480. 7( 
51245.1 ( 
6710.7 ( 



Library size 



1.0000E+07 



total . . 
xxOOxx. 
xxQQxx. 
xxOQxS. 

xxeess . 

xxQQSS . 

xseoss. 
sseess. 

SSQQSS . 



5.3667E+06 
2289093.0 ( 49.2) 
143467.0 ( 93.4) 
450896.3 (93.4) 
132853.4 ( 93.4) 
4703.9(100.0) 
2464.0(100.0) 
121.0(100.0} 
4.0(100.0} 



fraction sampled = 6.2727E-01 



XxGQxx. . . . . 1254877. 0( 
xxSOxS. 985974.9 ( 



xxDQxS 
xxGQSS 

xseess 

xSQQSS 

sseoss 



43710.7 ( 
51516.1 ( 
6746.2 ( 
224.0(100.0) 
44.0 (100.0) 



74.2) 
74.2) 
99.6) 
99.6) 
99.6) 



Library size 



3.0000E+07 



total. 7.8961E+06 

xxGGxx 4040589 .0 ( 86.9) 

xxQQxx 153619.1(100.0) 

xxGfixS 482802.9(100.0) 

xxeeSS 142254.4(100.0) 

xxQQSS 4704.0(100.0) 

xSGQSS 2464.0(100.0) 

sseess 121.0(100.0) 

SSQQSS 4.0(100.0) 



fraction sampled - 9.2291B-01 



xxSQxx . 1661409.0 ( 98.3) 

xxBGxS 1305393.0 ( 98.3) 



xxQQxS. 
xxGQSS . 

xseess. 

xSQQSS . 

sseoss . 



43904.0(100.0) 
'51744.0(100.0) 
6776.0(100.0) 
224.0(100.0) 
44.0(100.0) 
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Table 131: Sampling of a Library 
Encoded by (NNT) 4 (NNG) a 
(continued) 



Library size - 



5.0000E+07 



10 



15 



20 



25 



tota l 8.3956E+06 fraction sampled - 9.8130E-01 

XX66XX 4491779. 0( 96.6) xxGQxx . 1688387. 0( 99.9) 

xxQDxx 153663.8(100.0) xx90xS 1326590. 0( 99.9) 



xxGQxS..... 482943.4(100.0) xxQQxS, 

xxSGSS 142295.8(100.0) xx6QSS 

XXQQSS 4704.0(100.0) XS96SS . 

xSBQSS 2464.0(100.0) xSOQSS. 

sseess 121.0(100.0) sseoss. 

SSOQSS 4.0(100.0) 



43904.0(100.0} 
51744.0 (100.0) 
6776.0(100.0) 
224.0(100.0) 
44.0(100.0) 



Library size - 



1.0000E+08 



total .... 
xxeexx. . . 

xxQQxx. . . 
xxOQxS. . . 
xxB6SS. . . 
xxQQSS . . . 
XS8QSS . . . 

SSQQSS . . . 



8.5503E+06 fraction sampled = 9.9938E-01 

1643063. 0( 99.9) xxBQxx 1690302.0(100.0 

153 664 . 0 ( 100 . 0 ) xx68xS . . 
482944.0(100.0) xxQQxS.. 
142296.0(100.0) xxGQSS.. 

4704.0(100.0) xseess.. 

2464.0(100.0) xSQQSS.. 

121.0(100.0) sseoss. . 

4.0(100.0) 



1690302.0(100.0) 
1328094^0 (100 TO) 
43904.0(100.0) 
51744VO(iob.O) 
6776.0(100.0) 
224.0(100.0) 
44.0(100.0) 
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Table 132 : Relative efficiencies of 
various simple variegation codons 



vaCodon 



Number of codons 
£ 



#DNA/#AA 
[#DNA] 



#DNA/#AA 
[#DNA] 



#DNA/#AA 
[#DNA] 
(*A&) 



10 



15. 



NNK 

assuming 
stops vanish 

NNT 



8.95 13.86 21.49 

[2.86-10 7 ] [8. 87 •10 s ] [2.75-10 10 ] 

(3.2-10 6 ) (6.4-10 7 ) (1.28-10 9 ) 

1.38 1.47 1.57 

[1. 05 •106] [1 .-68 • 10 7 ] [2 » 68^101] 

(7.59 -10 5 ) (1.14 -10 7 ) (1.71 -10 s ) 



20 



NN6 

assuming 
stops vanish 



2.04 2.36 2.72 

[7.59 -10 5 ] [1.14-10*] [1.71-10'] 
(3.7-10 5 ) (4.83-10 6 ) (6.27- 10 7 ) 
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Table 140. Effect of anti BPTI ZgG on phage titer. 

Phage Input +Anti-BPTI +Anti-BPTI Eluted Phage 
Strain +Protein A (a) 



M13MP18 
BPTI. 3 
M13MB48 (c) 
M13MB48 (d) 



100 (b) 

100 

100 



98 
26 
90 
_6JL 



92 
21 
36 
JUL 



7 ♦10" 
6 
0.8 
2.6 



10 



15 



(a) Protein A- agarose beads. 

(b) Percentage of input phage 
forming units 

(c) Batch number 3 

(d) Batch number 4 



measured as plague 



Table 141. Effect of anti -BPTI or protein A on. phage titer 

Strain 



No % +Anti- +Anti- 
Input Addition BPTI +Protein A BPTI 
; tsJ +Protein A 



20 



M13MP18 100(b) 107 105 
MUMRAafbiioo , 22 7.1Q- 3 



72 
58 



65 



25 



(a) Protein A- agarose beads 

(b) Percentage of input phage 
forming units 

(c) Batch number 5 



measured as plaque 
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Table 142 



Effect of anfci-BPTI and non- immune serum on phage 



Strain 


Input 


+Anti- 
BPTI 


+NRS 
(a) 


+Anti- 
BPTI 
•fProtein A 

ti>) 


+NRS 

+Protein 

A 


M13MP18 


100(c) 


65 


104 


71 


88 


M13MB48 (d) 


100 


30 


125 


13 


121 


M13MB48 fe) 


100 


2 


105 


0.7 


110 



10 



15 



(a) Purified IgG from normal rabbit serum. 

(b) Protein A- agarose beads. 

(c) Percentage of input phage measured as plaque 
forming units 

(d) Batch number 4 
'(e) Batch number 5 



Table 143 • Loss 
anhydro trypsin • 



in titer of display phage with 



20 

Strain • 


Anhydro trypsin 


S trep tavidin 
Beads 






Post 






Post 




Start 


Incubation 


Start 


Xn 


euhation 


M13MP18 


100 (a) 


121 


ND 




ND 


25 M13MB48 


100 


58 


100 




98 


5AA Pool 


100 


44 


100 




93 



(a) Plaque forming units expressed as a percentage of input. 



30 
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Table 144. Binding of Display Phage to Anhydrotrypsin. 



Experiment 1, 
Strain 

M13MP18 

BPTI-IIIMK 

M13MB48 



Eluted Phage (a) 

i 

0.2 (a) 
7.9 
11.2 



Relative to 
M13MP1B 
1.0 
39.5 
56.0 



10 



15 



20 



25 



Experiment 2. 
Strain 

M13mpl8 

BPTI-IIIMK 

M13MB56 



Eluted Phage (a) 
0.3 

12 .0 '■" ' ' : - 

17.0 



Relative to 
M13inpl8 
1.0 
40.0 
56.7 



(a) Plaque forming units acid eluted from beads, expressed 
as a percentage of the input. 

Table 145. Binding of Display Phage to Anhydro trypsin or 
Trypsin. 



Strain 1 


Anhydrotrypsin Bessie 


Trvosin Beads 




Eluted 
Phage Relative 

(&> Bindincr (b> 


Eluted * 
Phage Relative 
Bindino 



30 



MI3MP18 | 0- 1 

BPTI-IIIMK | 9.1 

M13.3X7 | 25.0 

M13.3X11 | 9.2 



1 
91 
250 
92 



| 2. 3X10 - * 
| 1.17 

I 1.4 

| 0.27 



1.0 
5x103 
6x10 s 
1.2x10 s 



(a) Plaque forming units eluted from beads, expressed as a 
percentage of the input. 

(b) Relative to the non- display phage, M13MP18. 



35 
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10 



15 



Table 146. Binding o£ Display Phage to Trypsin or Human 
Neutrophil Elastase. 



Strain 



Trypsin Beads 



Eluted Phage 



Relative 
Binding (b) 



HNE Beads 



Eluted 
Phage 



Relative 
Binding 



M13MP18 | 


5x10"* 


1 


| 3X10 A 




1.0 


BPTI-IIIMKl 


1.0 


2000 


| 5X10" 3 




16.7 


M13MB48 | 


0.13 


260 


| 9x10* 




30.0 


M13.3X7 | 


1.15 


2300 


| IxlO" 3 




3.3 


M13 . 3X11 | 


0.8 


1600 


| 2x10"* 




6.7 


BPTX3 . CL | 


IxlO" 3 


2 | 


4.1 


1 


.4xl0 4 


(c) 












(a) Plaque 


forming 


units acid 


eluted from 


the beads, 



expressed as a percentage of input. 

(b) Relative to the non-display phage, M13MP18. 

(c) BPTI-IIIMK (K15L MGNG) 
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Table 820 1 S trap tavi din -Binding Phage 

Putative Streptavidin 

Name Binding Peptide Seq. 

DEV(P) AE-P CHPOYRIiC ORPLKOPPPPPPAB... 

Dev(E) A E - L CHPOFPRC NL FRKVPP P P P P A E . . . 

HPQ6 A E 6 P CHPOFPRC YIEGRIV - E. . . 

11111111112222222 
12345678901234567890123 4 5 6 
- - - - C c ------------ - E 
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Table 827* Effect of DTT on biotinylated HRP 
binding to streptavidin agarose. 

Cone. DTT Biotin-HRP Color 
(mM) nwmlBPaent , — 

0 
2 
10 
20 
£0 

Table 828* Effect of DTT on 
HPQ6 display pnage infeetivity. 



DTT (irtM) Putative TnfectivitV 

0 1.00 
2 - . 0 . 95- :. _,. . 

10 1.00 
20 1-08 



++++ 
++++ 
+++ 
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SO 






1.23 






Table 829: 


Effect of the 


presence of 






DTT on the binding of display phage to 






streptavidin agarose beads* 




Inputs : 


MKTN 4.2 X 10", 


HPQ6 3.3 x 10". 




Name 


Concn DTT 


Fraction Bound 


Relative 




iuM) 








filUQlna 


\M 1/ 1 1 1 KT 

lYus. 1JN 


0 


4.8 


X 


10" 6 


i nn 




2,000 


5.4 


X 


10-* 






10,000 


5.2 


X 


io-* 




HPQ6 


■; •' _ ... 0 


1.6 


X. 


io-» 


1.00 




1 


1-6 


X 


io-» 


1.00 




10 


1.5 


X 


io-» 


0.90 




100 


9.7 


X 


io- 5 


0.60 




1000 


1.6 


X 


10"* 


0.10 




2,000 


1.0 


X 


io- 5 


0.06 




5.000 


8.8 


x 


io-* 


0.05 



0 
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Table 832 x Effect of preincubating BPQ6 display 
phage witli Factor X. on binding to streptavidin beads, 



Factor 3S. 



Titer after 
Treatment 



Relative 

Titer 



fr^Unn Bound 



Relative 
Binding 



3.3 x 10" 

3.3 x 10" 



1 

_1_ 



1.4 X 10-* 

1-2 x IP" 5 



3C IP' 2 



Table 833 1 FX. treatment of HPQ6 display pnage 
following binding to streptavidin. 



Factor X, 



Total 
Fraction. 
Bound 



Fraction 
Eluted 

fvy Treatment 



% Removed 
by 

Treatment 



7*6 x 10 4 
x IP" 3 



1.6 x 10" 3 
3C IP" 3 



14 



Amount of FX, 

f units) 



Table 834* Removal of HPQ6 display pnage 

from streptavidin by FX. 

•Time * Removed by 
(hrs) Treatment, — 



0 

2.5 
6.3 
12.5 



1 
1 
1 

.1 



17 
21 
22 
35 



0 

2.5 
6.3 

12.5 



2 
2 
2 

_2_ 



18 

53 
54 
52- 
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CIATMS 

1. In a process for developing novel epitopes or 
binding proteins with a desired binding activity against a 
particular target material which comprises providing a 

5 library of phage which each displays on its surface, as a 
result of expression of a first phage gene, one or more 
copies of a particular chimeric coat protein, each chimeric 
coat protein comprising a potential epitope, or a potential 
binding domain which is a mutant of a known protein domain 

10 foreign to said phage, said library collectively displaying 
... a plurality- of potential epitopes or binding domains, 

contacting said library of phage with the target material,, 
and separating the phage on the basis of their affinity for 
the target material, the improvement wherein said chimeric 

15 coat protein further comprises a linker peptide which is 
specifically cleavable by said site-specific protease. 

2. The method of claim 1 wherein the site- 
specific protease is Factor Xa, Factor XIa, kallikrein, 
thrombin. Factor XXIa, collagenase or enterokinase. 

20 3* The method of claim 1 wherein, after said 

library of phage is contacted with said target material, 
(1) low affinity phage are removed, (2) high affinity phage 
still bound to said target material are released by 
cleavage of said chimeric coat protein at said linker by 

25 means of a site-specific protease, and the released high 
affinity phage are recovered. 

4. In a process for developing novel binding 
peptides or proteins with a desired binding activity 
against a particular target material which comprises 

30 providing a library of phage which each displays on its 

surface, as a result of expression of a first phage gene, 
one or more copies of a particular chimeric coat protein, 
each chimeric coat protein comprising a mutant of a known 
protein domain foreign to said phage, said library 
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collectively displaying a plurality of potential binding 
domains, contacting said library of phage with the target 
material, and separating the phage on the basis of their 
affinity for the target material, the improvement wherein 
5 said said potential binding domain has at least one 

intrachain covalent crosslink between a first amino acid 
position and a second amino acid position thereof , the 
amino acids at said first and second positions being 
invariant in all of the chimeric proteins displayed by said 

10 library, and where low affinity phage 1 are removed from said 
target material first, and then high affinity phage are 
released or rendered more readily eluted from the target " 
material, by treating the phage with a reagent which cleaves 
the crosslink, preferably a reagent which does not kill the;: 

15 phage* • ----- ' - : ' -------^ 

5. The method of any of claims 1-4 wherein the 
domain is a mini-protein of less than sixty amino acids, 
more preferably a micro-protein of less than forty amino 
acids . 

20 6. The method of claim 4 wherein the crosslink 

is a disulfide bond and the amino acids at said first and 
second positions are cysteines. 

7. The method of claim 6 wherein the reagent is 
dithiothreitol . 

25 8. The method of any of claims 1-7 wherein the 

phage further comprises a second phage gene encoding the 
cognate wild- type coat protein of the phage. 

9. In a process for developing novel epitopes 
with a desired binding activity against a particular target 

30 material which comprises providing a library of phage which 
each displays on its surface, as a result of expression of 
a first phage gene, one or more copies of a particular 
chimeric coat protein, each chimeric coat protein 
comprising a potential epitope, said library collectively 
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displaying a plurality of potential epitopes, contacting 
said library of phage with the target material, 
separating the phage on the basis of their affinity for the 
target material, the improvement wherein the phage further 
5 comprises a second phage gene encoding the cognate wild- 
type coat protein of the phage. 

10. In a process for developing novel epitopes 
or binding proteins with a desired binding activity against 
a particular target material which comprises providing a 
10 library of phage which each displays on its surface, as a 

result of expression of a first phage gene, one or more 
" copies of a particular chimeric coat protein, each l^imeric 
coat protein cori$>risihg a potential epitope , or a pot ent lalT 
- ^ binding domain which is a mutant of a known protein domain 
15 foreign to said phage, said library collectively displaying 
a plurality of potential epitopes or binding domains , 
contacting said library of phage with the target material, 
and separating the phage on the basis of their affinity for 
the target material, the improvement wherein the chimeric 
20 coat protein includes only an assemblable f ragment of a 
coat protein of said phage, and not that portion of the 
coat protein which is responsible for pilus binding, and 
the phage also comprises a second phage gene encoding the 
cognate native coat protein of the phage. 
25 11. In a process for developing novel epitopes 

with a desired binding activity against a particular target 
material which comprises providing a library.. of pha<je which 
each displays on its surface, as a result of expression of 
a first phage gene, one or more copies of a particular 
30 chimeric coat protein, each chimeric coat protein 

comprising a potential epitope, said library collectively 
displaying a plurality of potential epitopes, contacting 
said library of phage with the target material, and 
separating the phage on the basis of their affinity for the 
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target material, the improvement wherein the cognate wild- 
type coat protein of the phage is the major coat protein of 
the phage. 

12. The method of claims 8-11 wherein the 

5 initiation codon of the second phage gene is a leucine • 

13. The method of any of claims 1-12 wherein the 
first phage gene further comprises a cytoplasmic secretion 
signal sequence which codes for a signal peptide which 
directs the immediate expression product to the inner 

10 membrane of the bacterial host cell infected by said phage, 
where it is processed to remove said signal peptide, 
yielding a mature chimeric coat protein comprising the- 
potential binding domain and at- least a portion of *;a 
~ geneVIIIvr l ike protein of T . r tl^ie ^phage > mm said chiirieric^profeein 

15 being assembled with wild-type coat protein into the phage-, 
coat, wherein the secretion signal is encoded by a .signal 
sequence selected from the group consisting of the signal 
sequences of the phoA , bla and oenelll genes. 

14. In a process for developing novel epitopes 

20 or binding proteins with a desired binding activity- against;^ 
a particular target material which comprises providing a. 
library of phage which each displays on its surface, as a 
result of expression of a first phage gene, one or more 
copies of a particular chimeric coat protein, each chimeric 

25 coat protein comprising a potential epitope, or a potential 
binding domain which is a mutant of a known protein domain 
foreign to said phage, said library collectively displaying 
a plurality of potential epitopes or binding domains, 
contacting said library of phage with the target material, 

30 and separating the phage on the basis of their affinity for 
; the target material, the improvement wherein the phage also 

comprises a second phage gene encoding the cognate native 
v coat protein of the phage, and the initiation codon of the 

second phage gene is a Leucine codon. 
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15. The method of any of claims 1-14 wherein the 
differentiation among said plurality of different potential 
binding domains occurs through the at least partially f 
random variation of one or more predetermined amino acid 
5 positions of said known domain to randomly obtain at each « 
said position an amino acid belonging to a predetermined 
set of two or more amino acids, the amino acids of said set 
occurring at said position in predetermined expected 
proportions . 

10 16. The method of claim 15 wherein the differentiation 

among said potential binding domains of said library is limited to 
no more than about 20 predetermined amino acid residues of said 
sequence, 

17. The method of claim 15 wherein # for each set, the 
- 15 ratio of the probability of occurrence of the most favored amino 

acid to that for the least favored amino acid is less than aboout 
2.6. 

18. The method of any of claims 1-17 wherein, for any 
20 potentially encoded potential binding domain, the probability that 

it will be displayed by at least one package in said population is 
at least 50%, more preferably at least 90%. 

19. The method of any of claims 1-18 wherein said 
population is characterized by the display of at least 10 5 

25 different potential binding domains. 

20. The method of any of claims 1-19 wherein the 
initially chosen parental potential binding domain is selected from 
the group consisting of (a) binding domains of bovine pancreatic 
trypsin inhibitor, crambin, Cucurbita maxima trypsin inhibitor III, 

30 a heat-stable enterotoxin of Excherichia coli , an alpha-, mu- or 
omega- conotoxin, apamin, charybdotoxin, secretory leukocyte 
protease inhibitor, cys tat in, eglin, barley protease inhibitor, 
ovomucoid, T4 lysozyme, hen egg white lysozyme, ribonuclease, 
azurin, tumor necrosis factor, and CD4, and (b) domains at least 
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substantially homologous with any of the foregoing domains which 
have a melting point of at least 50°C. 

21. In a process for developing novel epitopes with a 
* desired affinity for a particular binding protein target material 

5 which comprises providing a library of phage which each displays on 
its surface, as a result of expression of a first phage gene, one 
or more copies of a particular chimeric coat protein, each chimeric 
coat protein comprising a potential epitope, or a potential binding 
domain which is a mutant of a known protein domain foreign to said 
10 phage, said library collectively displaying a plurality of 

potential epitopes or binding domains, contacting said library of 
phage with the ^krget^ ^ phage Ton the 
- ""' basis of their affinity for the -Binding protein taf^it material, 

.m x the in^rcwement^^ereih the differentiation among said plurality ' 
0 f different potential binding domains occurs -through the at least 
partially random variation of one or more predetermined amino acid 
positions of said known domain to randomly obtain at each said 
position an amino acid belonging to a predetermined set of two or 
more amino acids, the amino acids of said set occurring at said 
20 position in predetermined expected proportions, and in ''~"" J 
substantially all sets the ratio of the frequency of occurrence of 
the most favored amino acid to that for the least favored amino 
acid is less than 2.6. 

22. The method of claim 21 in which at least one 

25 variable amino acid position is encoded by a simply variegated 
codon selected from the group consisting of NNT, NNG, RNG, RMG, 
VNT, RRS, and SNT. 

23. The method of claim 21 wherein none of the variable 
amino acid positions is encoded by a simply variegated codon 

30 selected from the group consisting of NNN, NNK and NNS. 
* 24. The method of claim 21 in which at least one 

variable amino acid position is encoded by a complexly variegated 
. .. ^ codon. 
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25. A library of display phage which each displays on 
its surface, as a result of expression of a first phage gene, one 
or more copies of a particular chimeric coat protein, each chimeric j 
coat protein comprising a potential epitope, or a potential binding 
5 domain which is a mutant of a known protein domain foreign to said ^ 
phage, said library collectively displaying a plurality of 
potential epitopes or binding domains, contacting said library of 
phage with the target material, and separating the phage on the 
basis of their affinity for the binding protein target material, 

10 wherein the differentiation among said plurality of different 
potential binding domains occurs through the at least partially 
random variation of one or more predetermined amino acid positions 
of said known domain to randomly obtain at each said position an 
amino acid belonging to cl- predetermined set of two or more amino - r 

15 acids, the amino acids of said set occurring at said position in 
predetermined expected proportions, and in substantially all sets 
the ratio of the frequency of occurrence of the most favored amino 
acid to that for the least favored amino acid is less than 2.6. 

.26. A library of display phage which each displays on 

20 its surface, as a result of expression of a first phage gene, one 
or more copies of a particular chimeric coat protein, each chimeric 
coat protein comprising a potential epitope, or a potential binding 
domain which is a mutant of a known protein domain foreign to said 
phage, said library collectively displaying a plurality of 

25 potential epitopes or binding domains, contacting said library of 
phage with the target material, and separating the phage on the 
basis of their affinity for the target material, wherein said 
chimeric coat protein further comprises a linker peptide which is 
specifically cleavable by said site-specific protease. 

30 

f 
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